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To my parents, for everything 


Preface to the First Edition 


This text originated from the lecture notes I gave teaching the honours undergraduate- 
level real analysis sequence at the University of California, Los Angeles, in 2003. 
Among the undergraduates here, real analysis was viewed as being one of the most 
difficult courses to learn, not only because of the abstract concepts being introduced 
for the first time (e.g., topology, limits, measurability, etc.), but also because of the 
level of rigour and proof demanded of the course. Because of this perception of 
difficulty, one was often faced with the difficult choice of either reducing the level 
of rigour in the course in order to make it easier, or to maintain strict standards and 
face the prospect of many undergraduates, even many of the bright and enthusiastic 
ones, struggling with the course material. 

Faced with this dilemma, I tried a somewhat unusual approach to the subject. 
Typically, an introductory sequence in real analysis assumes that the students are 
already familiar with the real numbers, with mathematical induction, with elementary 
calculus, and with the basics of set theory, and then quickly launches into the heart 
of the subject, for instance the concept of a limit. Normally, students entering this 
sequence do indeed have a fair bit of exposure to these prerequisite topics, though 
in most cases the material is not covered in a thorough manner. For instance, very 
few students were able to actually define a real number, or even an integer, properly, 
even though they could visualize these numbers intuitively and manipulate them 
algebraically. This seemed to me to be a missed opportunity. Real analysis is one 
of the first subjects (together with linear algebra and abstract algebra) that a student 
encounters, in which one truly has to grapple with the subtleties of a truly rigorous 
mathematical proof. As such, the course offered an excellent chance to go back to 
the foundations of mathematics, and in particular the opportunity to do a proper and 
thorough construction of the real numbers. 

Thus the course was structured as follows. In the first week, I described some 
well-known “paradoxes” in analysis, in which standard laws of the subject (e.g., 
interchange of limits and sums, or sums and integrals) were applied in a non-rigorous 
way to give nonsensical results such as 0 = 1. This motivated the need to go back to 
the very beginning of the subject, even to the very definition of the natural numbers, 
and check all the foundations from scratch. For instance, one of the first homework 
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assignments was to check (using only the Peano axioms) that addition was associative 
for natural numbers (i.e., that (a + b) +c = a+ (b-+c) for all natural numbers 
a, b, c: see Exercise 2.2.1). Thus even in the first week, the students had to write 
rigorous proofs using mathematical induction. After we had derived all the basic 
properties of the natural numbers, we then moved on to the integers (initially defined 
as formal differences of natural numbers); once the students had verified all the basic 
properties of the integers, we moved on to the rationals (initially defined as formal 
quotients of integers); and then from there we moved on (via formal limits of Cauchy 
sequences) to the reals. Around the same time, we covered the basics of set theory, 
for instance demonstrating the uncountability of the reals. Only then (after about ten 
lectures) did we begin what one normally considers the heart of undergraduate real 
analysis—limits, continuity, differentiability, and so forth. 

The response to this format was quite interesting. In the first few weeks, the 
students found the material very easy on a conceptual level, as we were dealing 
only with the basic properties of the standard number systems. But on an intellectual 
level it was very challenging, as one was analyzing these number systems from a 
foundational viewpoint, in order to rigorously derive the more advanced facts about 
these number systems from the more primitive ones. One student told me how diffi- 
cult it was to explain to his friends in the non-honours real analysis sequence (a) 
why he was still learning how to show why all rational numbers are either posi- 
tive, negative, or zero (Exercise 4.2.4), while the non-honours sequence was already 
distinguishing absolutely convergent and convergent series, and (b) why, despite this, 
he thought his homework was significantly harder than that of his friends. Another 
student commented to me, quite wryly, that while she could obviously see why one 
could always divide a natural number n into a positive integer g to give a quotient 
a and a remainder r less than g (Exercise 2.3.5), she still had, to her frustration, 
much difficulty in writing down a proof of this fact. (I told her that later in the 
course she would have to prove statements for which it would not be as obvious 
to see that the statements were true; she did not seem to be particularly consoled 
by this.) Nevertheless, these students greatly enjoyed the homework, as when they 
did perservere and obtain a rigorous proof of an intuitive fact, it solidified the link 
in their minds between the abstract manipulations of formal mathematics and their 
informal intuition of mathematics (and of the real world), often in a very satisfying 
way. By the time they were assigned the task of giving the infamous “epsilon and 
delta” proofs in real analysis, they had already had so much experience with formal- 
izing intuition, and in discerning the subtleties of mathematical logic (such as the 
distinction between the “for all” quantifier and the “there exists” quantifier), that 
the transition to these proofs was fairly smooth, and we were able to cover material 
both thoroughly and rapidly. By the tenth week, we had caught up with the non- 
honours class, and the students were verifying the change of variables formula for 
Riemann-Stieltjes integrals, and showing that piecewise continuous functions were 
Riemann integrable. By the conclusion of the sequence in the twentieth week, we 
had covered (both in lecture and in homework) the convergence theory of Taylor 
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and Fourier series, the inverse and implicit function theorem for continuously differ- 
entiable functions of several variables, and established the dominated convergence 
theorem for the Lebesgue integral. 

In order to cover this much material, many of the key foundational results were 
left to the student to prove as homework; indeed, this was an essential aspect of the 
course, as it ensured the students truly appreciated the concepts as they were being 
introduced. This format has been retained in this text; the majority of the exercises 
consist of proving lemmas, propositions and theorems in the main text. Indeed, I 
would strongly recommend that one do as many of these exercises as possible—and 
this includes those exercises proving “obvious” statements—if one wishes to use this 
text to learn real analysis; this is not a subject whose subtleties are easily appreciated 
just from passive reading. Most of the chapter sections have a number of exercises, 
which are listed at the end of the section. 

To the expert mathematician, the pace of this book may seem somewhat slow, 
especially in early chapters, as there is a heavy emphasis on rigour (except for those 
discussions explicitly marked “Informal’’), and justifying many steps that would ordi- 
narily be quickly passed over as being self-evident. The first few chapters develop (in 
painful detail) many of the “obvious” properties of the standard number systems, for 
instance that the sum of two positive real numbers is again positive (Exercise 5.4.1), 
or that given any two distinct real numbers, one can find rational number between 
them (Exercise 5.4.5). In these foundational chapters, there is also an emphasis on 
non-circularity—not using later, more advanced results to prove earlier, more prim- 
itive ones. In particular, the usual laws of algebra are not used until they are derived 
(and they have to be derived separately for the natural numbers, integers, rationals, 
and reals). The reason for this is that it allows the students to learn the art of abstract 
reasoning, deducing true facts from a limited set of assumptions, in the friendly and 
intuitive setting of number systems; the payoff for this practice comes later, when one 
has to utilize the same type of reasoning techniques to grapple with more advanced 
concepts (e.g., the Lebesgue integral). 

The text here evolved from my lecture notes on the subject, and thus is very much 
oriented towards a pedagogical perspective; much of the key material is contained 
inside exercises, and in many cases I have chosen to give a lengthy and tedious, but 
instructive, proof instead of a slick abstract proof. In more advanced textbooks, the 
student will see shorter and more conceptually coherent treatments of this material, 
and with more emphasis on intuition than on rigour; however, I feel it is important to 
know how to do analysis rigorously and “by hand” first, in order to truly appreciate 
the more modern, intuitive and abstract approach to analysis that one uses at the 
graduate level and beyond. 

The exposition in this book heavily emphasizes rigour and formalism; however 
this does not necessarily mean that lectures based on this book have to proceed the 
same way. Indeed, in my own teaching I have used the lecture time to present the 
intuition behind the concepts (drawing many informal pictures and giving examples), 
thus providing a complementary viewpoint to the formal presentation in the text. 
The exercises assigned as homework provide an essential bridge between the two, 
requiring the student to combine both intuition and formal understanding together 
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in order to locate correct proofs for a problem. This I found to be the most difficult 
task for the students, as it requires the subject to be genuinely learnt, rather than 
merely memorized or vaguely absorbed. Nevertheless, the feedback I received from 
the students was that the homework, while very demanding for this reason, was also 
very rewarding, as it allowed them to connect the rather abstract manipulations of 
formal mathematics with their innate intuition on such basic concepts as numbers, 
sets, and functions. Of course, the aid of a good teaching assistant is invaluable in 
achieving this connection. 

With regard to examinations for a course based on this text, I would recommend 
either an open-book, open-notes examination with problems similar to the exercises 
given in the text (but perhaps shorter, with no unusual trickery involved), or else 
a take-home examination that involves problems comparable to the more intricate 
exercises in the text. The subject matter is too vast to force the students to memorize 
the definitions and theorems, so I would not recommend a closed-book examination, 
or an examination based on regurgitating extracts from the book. (Indeed, in my own 
examinations I gave a supplemental sheet listing the key definitions and theorems 
which were relevant to the examination problems.) Making the examinations similar 
to the homework assigned in the course will also help motivate the students to work 
through and understand their homework problems as thoroughly as possible (as 
opposed to, say, using flash cards or other such devices to memorize material), which 
is good preparation not only for examinations but for doing mathematics in general. 

Some of the material in this textbook is somewhat peripheral to the main theme 
and may be omitted for reasons of time constraints. For instance, as set theory is 
not as fundamental to analysis as are the number systems, the chapters on set theory 
(Chapters 3, 8) can be covered more quickly and with substantially less rigour, or be 
given as reading assignments. The appendices on logic and the decimal system are 
intended as optional or supplemental reading and would probably not be covered in 
the main course lectures; the appendix on logic is particularly suitable for reading 
concurrently with the first few chapters. Also, Chapter 5 (on Fourier series) is not 
needed elsewhere in the text and can be omitted. 

For reasons of length, this textbook has been split into two volumes. The first 
volume is slightly longer, but can be covered in about thirty lectures if the peripheral 
material is omitted or abridged. The second volume refers at times to the first, but can 
also be taught to students who have had a first course in analysis from other sources. 
It also takes about thirty lectures to cover. 

I am deeply indebted to my students, who over the progression of the real anal- 
ysis course corrected several errors in the lectures notes from which this text is 
derived, and gave other valuable feedback. I am also very grateful to the many 
anonymous referees who made several corrections and suggested many impor- 
tant improvements to the text. I also thank Adam, James Ameril, Quentin Batista, 
Biswaranjan Behara, José Antonio Lara Benitez, Dingjun Bian, Petrus Bianchi, 
Phillip Blagoveschensky, Tai-Danae Bradley, Brian, Eduardo Buscicchio, Carlos, 
cebismellim, Matheus Silva Costa, Gonzales Castillo Cristhian, Ck, William Deng, 
Kevin Doran, Lorenzo Dragani, EO, Florian, Gyao Gamm, Evangelos Georgiadis, 
Aditya Ghosh, Elie Goudout, Ti Gong, Ulrich Groh, Gékhan Giiglii, Yaver Gulusoy, 
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Christian Gz., Kyle Hambrook, Minyoung Jeong, Bart Kleijngeld, Erik Koelink, Brett 
Lane, David Latorre, Matthis Lehmkihler, Bin Li, Percy Li, Ming Li, Mufei Li, Zijun 
Liu, Rami Luisto, Jason M., Manoranjan Majji, Mercedes Mata, Simon Mayer, Geoff 
Mess, Pieter Naaijkens, Vineet Nair, Jorge Pefia- Vélez, Cristina Pereyra, Huaying 
Qiu, David Radnell, Tim Reijnders, Issa Rice, Eric Rodriquez, Pieter Roffelsen, 
Luke Rogers, Feras Saad, Gabriel Salmer6én, Vijay Sarthak, Leopold Schlicht, Marc 
Schoolderman, SkysubO, Rainer aus dem Spring, Sundar, Rafat Szlendak, Karim 
Taya, Chaitanya Tappu, Winston Tsai, Kent Van Vels, Andrew Verras, Murtaza 
Wani, Daan Wanrooy, John Waters, Yandong Xiao, Sam Xu, Xueping, Hongjiang 
Ye, Luging Ye, Muhammad Atif Zaheer, Zelin, and the students of Math 401/501 and 
Math 402/502 at the University of New Mexico for corrections to the first, second, 
and third editions. 


Terence Tao 


Preface to Subsequent Editions 


Since the publication of the first edition, many students and lecturers have commu- 
nicated a number of minor typos and other corrections to me. There was also some 
demand for a hardcover edition of the texts. Because of this, the publishers and I 
have decided to incorporate the corrections and issue a hardcover second edition of 
the textbooks. The layout, page numbering, and indexing of the texts have also been 
changed; in particular the two volumes are now numbered and indexed separately. 
However, the chapter and exercise numbering, as well as the mathematical content, 
remains the same as the first edition, and so the two editions can be used more or 
less interchangeably for homework and study purposes. 

The third edition contains a number of corrections that were reported for the 
second edition, together with a few new exercises, but are otherwise essentially the 
same text. The fourth edition similarly incorporates a large number of additional 
corrections reported since the release of the third edition, as well as some additional 
exercises. 


Los Angeles, USA Terence Tao 
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Chapter 1 ®) 
Introduction eects 


1.1. What Is Analysis? 


This text is an honors-level undergraduate introduction to real analysis: the analysis 
of the real numbers, sequences and series of real numbers, and real-valued functions. 
This is related to, but is distinct from, complex analysis, which concerns the analysis 
of the complex numbers and complex functions, harmonic analysis, which concerns 
the analysis of harmonics (waves) such as sine waves, and how they synthesize other 
functions via the Fourier transform, functional analysis, which focuses much more 
heavily on functions (and how they form things like vector spaces), and so forth. 
Analysis is the rigorous study of such objects, with a focus on trying to pin down 
precisely and accurately the qualitative and quantitative behavior of these objects. 
Real analysis is the theoretical foundation which underlies calculus, which is the 
collection of computational algorithms which one uses to manipulate functions. 

In this text we will be studying many objects which will be familiar to you from 
freshman calculus: numbers, sequences, series, limits, functions, definite integrals, 
derivatives, and so forth. You already have a great deal of experience of computing 
with these objects; however here we will be focused more on the underlying theory 
for these objects. We will be concerned with questions such as the following: 


1. What is a real number? Is there a largest real number? After 0, what is the “next” 
real number (i.e., what is the smallest positive real number)? Can you cut a real 
number into pieces infinitely many times? Why does a number such as 2 have 
a square root, while a number such as —2 does not? If there are infinitely many 
reals and infinitely many rationals, how come there are “more” real numbers than 
rational numbers? 

2. How do you take the limit of a sequence of real numbers? Which sequences have 
limits and which ones don’t? If you can stop a sequence from escaping to infinity, 
does this mean that it must eventually settle down and converge? Can you add 
infinitely many real numbers together and still get a finite real number? Can you 
add infinitely many rational numbers together and end up with a non-rational 
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number? If you rearrange the elements of an infinite sum, is the sum still the 
same? 

3. What is a function? What does it mean for a function to be continuous? differ- 
entiable? integrable? bounded? Can you add infinitely many functions together? 
What about taking limits of sequences of functions? Can you differentiate an 
infinite series of functions? What about integrating? If a function f(x) takes the 
value 3 when x = 0 and5 when x = 1 (.e., f(0) = 3 and f(1) = 5), does it have 
to take every intermediate value between 3 and 5 when x goes between 0 and 1? 
Why? 


You may already know how to answer some of these questions from your calculus 
classes, but most likely these sorts of issues were only of secondary importance to 
those courses; the emphasis was on getting you to perform computations, such as 
computing the integral of x sin(x*) from x = 0 to x = 1. But now that you are 
comfortable with these objects and already know how to do all the computations, we 
will go back to the theory and try to really understand what is going on. 


1.2. Why Do Analysis? 


It is a fair question to ask, “why bother?”’, when it comes to analysis. There is a 
certain philosophical satisfaction in knowing why things work, but a pragmatic person 
may argue that one only needs to know how things work to do real-life problems. 
The calculus training you receive in introductory classes is certainly adequate for 
you to begin solving many problems in physics, chemistry, biology, economics, 
computer science, finance, engineering, or whatever else you end up doing—and 
you can certainly use things like the chain rule, L H6pital’s rule, or integration by 
parts without knowing why these rules work, or whether there are any exceptions to 
these rules. However, one can get into trouble if one applies rules without knowing 
where they came from and what the limits of their applicability are. Let me give 
some examples in which several of these familiar rules, if applied blindly without 
knowledge of the underlying analysis, can lead to disaster. 


Example 1.2.1 (Division by zero). This is a very familiar one to you: the cancellation 
law ac = bc = > a= b does not work when c = 0. For instance, the identity 1 x 
0 = 2 x 0 is true, but if one blindly cancels the 0 then one obtains | = 2, which is 
false. In this case it was obvious that one was dividing by zero; but in other cases it 
can be more hidden. 


Example 1.2.2 (Divergent series). You have probably seen geometric series such as 


the infinite sum 
ee ee ed 
7 2 4° 8 16 ; 


You have probably seen the following trick to sum this series: if we call the above 
sum S, then if we multiply both sides by 2, we obtain 
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1 1 1 
28=2+1+=-+-+4+=+4+---=24+S8 
+ tatgtest + 
and hence S$ = 2, so the series sums to 2. However, if you apply the same trick to 
the series 
S=1+2+4+8+4+16+--- 


one gets nonsensical results: 


28=2+4+8+16+---=S—-1=>5 S=-l. 
So the same reasoning that shows that | + 5 + i +--+ = 2 also gives that 1+ 2+ 
4+8+4.---=-—1. Why is it that we trust the first equation but not the second? A 


similar example arises with the series 


we can write 
S=1-(d-14+1-14-:--)=1-S 


and hence that S = 1/2; or instead we can write 
S=21-)+d-H+d-)+4+---=04+0+.--- 

and hence that S = 0; or instead we can write 
S=14+(C14+)+4+C14+1)+---=14+0+0+--- 


and hence that § = 1. Which one is correct? (See Exercise 7.2.1 for an answer.) 


Example 1.2.3 (Divergent sequences). Here is a slight variation of the previous 
example. Let x be a real number, and let L be the limit 


L= lim x”. 
noo 


Changing variables n = m + 1, we have 


L= lim x™'= lim xxx”=x lim x”. 
m+1—-oo m+l1>oo m+1—>oo 


But if m+ 1 — ov, then m — ov, thus 


lim x” = lim x” = lim x” =L, 
m+1—>oo m—>oo n—->oo 


and thus 
xL=L. 
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At this point we could cancel the L’s and conclude that x = 1| for an arbitrary real 
number x, which is absurd. But since we are already aware of the division by zero 
problem, we could be a little smarter and conclude instead that either x = 1, or 
L = 0. In particular we seem to have shown that 


lim x” = 0 forall x 4 1. 


noo 


But this conclusion is absurd if we apply it to certain values of x, for instance by 
specializing to the case x = 2 we could conclude that the sequence 1, 2, 4, 8,... 
converges to zero, and by specializing to the case x = —1 we conclude that the 
sequence 1, —1, 1, —1, ... also converges to zero. These conclusions appear to be 
absurd; what is the problem with the above argument? (See Exercise 6.3.4 for an 
answer.) 


Example 1.2.4 (Limiting values of functions). Start with the expression lim,_,., 


sin(x), make the change of variable x = y + z andrecall that sin(y + 7) = — sin(y) 
to obtain 

lim sin(x)= lim sin(y+z) = lim (—sin(y)) = — lim sin(y). 

x>0Oo y+ — oo yoo yoo 


Since lim,_,9 sin(x) = limy_,.. sin(y) we thus have 
lim sin(x) = — lim sin(x) 
X—>0O x—>0O 

and hence 


lim sin(x) = 0. 
x—> co 


If we then make the change of variables x = 2/2 + z and recall that sin(a/2 + z) = 
cos(z) we conclude that 

lim cos(x) = 0. 

X—>0O 


Squaring both of these limits and adding we see that 
lim (sin?(x) + cos?(x)) = 0° + 0? = 0. 
X—>0O 
On the other hand, we have sin?(x) + cos?(x) = 1 for all x. Thus we have shown 


that 1 = 0! What is the difficulty here? 


Example 1.2.5 (Interchanging sums). Consider the following fact of arithmetic. 
Consider any matrix of numbers, e.g., 


123 
456 
789 
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and compute the sums of all the rows and the sums of all the columns, and then total 
all the row sums and total all the column sums. In both cases you will get the same 
number—the total sum of all the entries in the matrix: 


1 2 3\ 6 

4 5 6] 15 

7 8 OF 24 
12 15 18 45 


To put it another way, if you want to add all the entries inanm x n matrix together, 
it doesn’t matter whether you sum the rows first or sum the columns first, you end 
up with the same answer. (Before the invention of computers, accountants and book- 
keepers would use this fact to guard against making errors when balancing their 
books.) In series notation, this fact would be expressed as 


m n n m 


dda =D Dain 


i=l j=l j=l i=l 


if a;; denoted the entry in the ith row and jth column of the matrix. 
Now one might think that this rule should extend easily to infinite series: 


[o.e) [o.e) [o,@) [o.e) 

2 20 22 

i=l j=l j=l i=1 
Indeed, if you use infinite series a lot in your work, you will find yourself having to 
switch summations like this fairly often. Another way of saying this fact is that in an 
infinite matrix, the sum of the row totals should equal the sum of the column totals. 
However, despite the reasonableness of this statement, it is actually false! Here is a 
counterexample: 


| 
oa) 
=e OOO 


1 0 
-1 1 
0-1 
0 0 
0 0 


If you sum up all the rows, and then add up all the row totals, you get 1; but if you sum 
up all the columns, and add up all the column totals, you get 0! So, does this mean 
that summations for infinite series should not be swapped and that any argument 
using such a swapping should be distrusted? (See Theorem 8.2.2 for an answer.) 


Example 1.2.6 (Interchanging integrals). The interchanging of integrals is a trick 
which occurs in mathematics just as commonly as the interchanging of sums. Suppose 
one wants to compute the volume under a surface z = f(x, y) (let us ignore the limits 
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of integration for the moment). One can do it by slicing parallel to the x-axis: for 
each fixed value of y, we can compute an area { f(x, y) dx, and then we integrate 
the area in the y variable to obtain the volume 


v= ff Fe. sarap. 


Or we could slice parallel to the y-axis for each fixed x and compute an area 
J f(x, y) dy and then integrate in the x-axis to obtain 


v= ff Fe. svavae. 


This seems to suggest that one should always be able to swap integral signs: 


[ [ten aay= ff fey) dydx. 


And indeed, people swap integral signs all the time, because sometimes one variable 
is easier to integrate in first than the other. However, just as infinite sums sometimes 
cannot be swapped, integrals are also sometimes dangerous to swap. An example is 
with the integrand e~*” — xye~*”. Suppose we believe that we can swap the integrals: 


co 1 1 © 
te —xye*”) dy dx = / (ico —xye “”) dx dy. (1.1) 
0 0 0 0 


1 


/ (e*” — xye™”) dy = ye Pap =e, 
0 


Since 


the left-hand side of (1.1) is i e-* dx = —e*|>° = 1. But since 


oe) 
le —xye) dx =xe PPP = 0, 
0 


the right-hand side of (1.1) is i, 0 dx = 0. Clearly 1 ¥ 0, so there is an error some- 
where; but you won’t find one anywhere except in the step where we interchanged 
the integrals. So how do we know when to trust the interchange of integrals? (See 
Theorem 8.5.1 of Analysis I for a partial answer.) 
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Example 1.2.7 (Interchanging limits). Suppose we start with the plausible looking 
statement 


2 2 
lim lim = lim lim —~—. (1.2) 
x>0y>0 x* + y y>0x>0 x2 + y2 
But we have 
2 x2 


ae, 


lim = 
y>0 x2 4 y? x? + 02 


so the left-hand side of (1.2) is 1; on the other hand, we have 


. x 0? 
Pe Pass + ye” 
so the right-hand side of (1.2) is 0. Since | is clearly not equal to zero, this suggests 
that interchange of limits is untrustworthy. But are there any other circumstances in 
which the interchange of limits is legitimate? (See Exercise 2.2.9 of Analysis IT for 
a partial answer.) 


Example 1.2.8 (Interchanging limits, again). Consider the plausible looking state- 
ment 

lim lim x" = lim lim x” 

x—1- n> 0o n>O x>1- 
where the notation x — I~ means that x is approaching | from the left. When x 
is to the left of 1, then lim,_... x” = 0, and hence the left-hand side is zero. But 
we also have lim,-,;- x” = | for all n, and so the right-hand side limit is 1. Does 
this demonstrate that this type of limit interchange is always untrustworthy? (See 
Proposition 3.3.3 of Analysis IT for an answer.) 


Example 1.2.9 (Interchanging limits and integrals). For any real number y, we have 


(oe) 


/ + = sos | ( ) 
dx = arctan(x = =. 
I+ @= 9) plate 2 


Taking limits as y — oo, we should obtain 


(oe) (oe) 


1 1 
fom eee = in ———, dx =a 
yoo lt (r—yP yee J T+ G—y? 
—0o 


But for every x, we have limy_,., TGSF = 0. So we seem to have concluded that 
0 =z. What was the problem with the above argument? Should one abandon the 
(very useful) technique of interchanging limits and integrals? (See Theorem 3.6.1 of 


Analysis IT for a partial answer.) 
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Example 1.2.10 (Interchanging limits and derivatives). Observe that if ¢ > 0, then 


d x3 _ 3x7(e7 + x7) — 2x4 
dx \e2 4x2) — (e2 + x2)? 


and in particular that 


d 3 
dx \e24 x2 =o = 0. 


Taking limits as ¢ — 0, one might then expect that 


d a? 
at ae a x= = 0. 
dx (<=): 

d 


But the right-hand side is =-x = 1. Does this mean that it is always illegitimate to 


interchange limits and derivatives? (See Theorem 3.7.1 of Analysis IT for an answer.) 


Example 1.2.11 (Interchanging derivatives). Let! f(x, y) be the function f(x, y) := 
¥ 3 . . . . . . . 
a + ;=z- A common maneuver in analysis is to interchange two partial derivatives, thus 


one expects 


a? a? 
: (0,0) = i (0, 0). 
Oxdy dyox 


But from the quotient rule we have 


of 3xy? 2xy* 
a, eo) = Sa Da. yay 
dy x?+y (x* + y*) 
and in particular 
of 0 0 
—(x,0)=— -— =0 
dy a x2 x4 
Thus P 
arf 
0,0) = 0. 
axay ) 


On the other hand, from the quotient rule again we have 


of 9° Dxey? 
ag ed) = a 24 y2\2 
Ox x+y (x* + y?) 


' One might object that this function is not defined at (x, y) = (0, 0), but if we set f(0, 0) := 0 then 


this function becomes continuous and differentiable for all (x, y), and in fact both partial derivatives 


um af are also continuous and differentiable for all (x, y)! 
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and hence ‘ 
of y 
A= fo 
Thus 
0 
f (0,0) = 1. 
dyox 


Since | 4 0, we thus seem to have shown that interchange of derivatives is untrust- 
worthy. But are there any other circumstances in which the interchange of derivatives 
is legitimate? (See Theorem 6.5.4 and Exercise 6.5.1 of Analysis IT for some answers.) 


Example 1.2.12 (L’H6pital’s rule). We are all familiar with the beautifully simple 
L H6pital’s rule 
fx) f'Q) 


lim = lim —_, 
x—>Xx9 g(x) 1X0 g (x) 


but one can still get led to incorrect conclusions if one applies it incorrectly. For 
instance, applying it to f(x) := x, g(x) := 1+ x, and xo := 0 we would obtain 


. * . ol 
lim = lim —- = 1, 
x-01+x x0] 


but this is the incorrect answer, since lim,-_,9 reer = 75 = 0. Of course, all that is 
going on here is that L H6pital’s rule is only applicable when both f(x) and g(x) 
go to zero as x — Xo, a condition which was violated in the above example. But 
even when f(x) and g(x) do go to zero as x — xg there is still a possibility for an 
incorrect conclusion. For instance, consider the limit 
_ x? sin(x~*4) 

lim. —————_. 

x0 x 
Both numerator and denominator go to zero as x — 0, so it seems pretty safe to 
apply L’H6pital’s rule, to obtain 


x? sin(x—*) _ 2x sin(x~*) — 4x73 cos(x74) 
————— = lim 


x>0 x x0 1 
= lim 2x sin(x~*) — lim 4x73 cos(x~*). 
x0 x0 
The first limit converges to zero by the squeeze test (since the function 2x sin(x~*) 
is bounded above by 2|x| and below by —2|x|, both of which go to zero at 0). But the 
second limit is divergent (because x~? goes to infinity as x > 0, and cos(x~*) does 
not go to zero). So the limit lim,_,9 2*82@=4* 8) diverges. One might then 


x? sin(x~*) 


conclude using L’ H6pital’s rule that lim,._,9 also diverges; however we can 
clearly rewrite this limit as lim,_,o x sin(x —4) which goes to zero when x — Oby the 
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squeeze test again. This does not show that L’ H6pital’s rule is untrustworthy (indeed, 
it is quite rigorous; see Sect. 10.5), but it still requires some care when applied. 


Example 1.2.13 (Limits and lengths). When you learn about integration and how it 
relates to the area under a curve, you were probably presented with some picture in 
which the area under the curve was approximated by a bunch of rectangles, whose 
area was given by a Riemann sum, and then one somehow “took limits” to replace 
that Riemann sum with an integral, which then presumably matched the actual area 
under the curve. Perhaps a little later, you learnt how to compute the length of a curve 
by a similar method—approximate the curve by a bunch of line segments, compute 
the length of all the line segments, and then take limits again to see what you get. 

However, it should come as no surprise by now that this approach also can lead to 
nonsense if used incorrectly. Consider the right-angled triangle with vertices (0, 0), 
(1, 0), and (0, 1), and suppose we wanted to compute the length of the hypotenuse 
of this triangle. Pythagoras’ theorem tells us that this hypotenuse has length /2, 
but suppose for some reason that we did not know about Pythagoras’ theorem, and 
wanted to compute the length using calculus methods. Well, one way to do so is to 
approximate the hypotenuse by horizontal and vertical edges. Pick a large number NV, 
and approximate the hypotenuse by a “staircase” consisting of N horizontal edges of 
equal length, alternating with N vertical edges of equal length. Clearly these edges 
all have length 1/N, so the total length of the staircase is 2N/N = 2. If one takes 
limits as N goes to infinity, the staircase clearly approaches the hypotenuse, and so in 
the limit we should get the length of the hypotenuse. However, as N — on, the limit 
of 2N/N is 2, not 2, so we have an incorrect value for the length of the hypotenuse. 
How did this happen? 


The analysis you learn in this text will help you resolve these questions, and will 
let you know when these rules (and others) are justified, and when they are illegal, 
thus separating the useful applications of these rules from the nonsense. Thus they 
can prevent you from making mistakes and can help you place these rules in a wider 
context. Moreover, as you learn analysis you will develop an “analytical way of 
thinking”, which will help you whenever you come into contact with any new rules 
of mathematics, or when dealing with situations which are not quite covered by the 
standard rules. For instance, what if your functions are complex-valued instead of 
real-valued? What if you are working on the sphere instead of the plane? What if 
your functions are not continuous, but are instead things like square waves and delta 
functions? What if your functions, or limits of integration, or limits of summation, 
are occasionally infinite? You will develop a sense of why a rule in mathematics 
(e.g., the chain rule) works, how to adapt it to new situations, and what its limitations 
Gif any) are; this will allow you to apply the mathematics you have already learnt 
more confidently and correctly. 


Chapter 2 ®) 
Starting at the Beginning: crest 
The Natural Numbers 


In this text, we will review the material you have learnt in high school and in elemen- 
tary calculus classes, but as rigorously as possible. To do so we will have to begin at 
the very basics - indeed, we will go back to the concept of numbers and what their 
properties are. Of course, you have dealt with numbers for over ten years and you 
know how to manipulate the rules of algebra to simplify any expression involving 
numbers, but we will now turn to a more fundamental issue, which is: why do the 
rules of algebra work at all? For instance, why is it true that a(b + c) is equal to 
ab + ac for any three numbers a, b, c? This is not an arbitrary choice of rule; it can 
be proven from more primitive, and more fundamental, properties of the number 
system. This will teach you a new skill - how to prove complicated properties from 
simpler ones. You will find that even though a statement may be “obvious”, it may 
not be easy to prove; the material here will give you plenty of practice in doing so, 
and in the process will lead you to think about why an obvious statement really is 
obvious. One skill in particular that you will pick up here is the use of mathematical 
induction, which is a basic tool in proving things in many areas of mathematics. 

So in the first few chapters we will re-acquaint you with various number systems 
that are used in real analysis. In increasing order of sophistication, they are the natural 
numbers N; the integers Z; the rationals Q, and the real numbers R. (There are other 
number systems such as the complex numbers C, but we will not study them until 
Sect. 4.6.) The natural numbers {0, 1, 2,...} are the most primitive of the number 
systems, but they are used to build the integers, which in turn are used to build the 
rationals. Furthermore, the rationals are used to build the real numbers, which are 
in turn used to build the complex numbers. Thus to begin at the very beginning, we 
must look at the natural numbers. We will consider the following question: how does 
one actually define the natural numbers? (This is a very different question from how 
to use the natural numbers, which is something you of course know how to do very 
well. It’s like the difference between knowing how to use, say, a computer, versus 
knowing how to build that computer.) 
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This question is more difficult to answer than it looks. The basic problem is that 
you have used the natural numbers for so long that they are embedded deeply into 
your mathematical thinking, and you can make various implicit assumptions about 
these numbers (e.g., that a + b is always equal to b + a) without even being aware 
that you are doing so; it is difficult to let go and try to inspect this number system 
as if it is the first time you have seen it. So in what follows I will have to ask you to 
perform a rather difficult task: try to set aside, for the moment, everything you know 
about the natural numbers; forget that you know how to count, to add, to multiply, to 
manipulate the rules of algebra, etc. We will try to introduce these concepts one at a 
time and identify explicitly what our assumptions are as we go along—and not allow 
ourselves to use more “advanced” tricks such as the rules of algebra until we have 
actually proven them. This may seem like an irritating constraint, especially as we 
will spend a lot of time proving statements which are “obvious”, but it is necessary 
to do this suspension of known facts to avoid circularity (e.g., using an advanced 
fact to prove a more elementary fact, and then later using the elementary fact to 
prove the advanced fact). Also, this exercise will be an excellent way to affirm the 
foundations of your mathematical knowledge. Furthermore, practicing your proofs 
and abstract thinking here will be invaluable when we move on to more advanced 
concepts, such as real numbers, functions, sequences and series, differentials and 
integrals, and so forth. In short, the results here may seem trivial, but the journey is 
much more important than the destination, for now. (Once the number systems are 
constructed properly, we can resume using the laws of algebra, etc., without having 
to rederive them each time.) 

We will also forget that we know the decimal system, which of course is an 
extremely convenient way to manipulate numbers, but it is not something which is 
fundamental to what numbers are. (For instance, one could use an octal or binary 
system instead of the decimal system, or even the Roman numeral system, and still 
get exactly the same set of numbers.) Besides, if one tries to fully explain what the 
decimal number system is, it isn’t as natural as you might think. Why is 00423 the 
same number as 423, but 32400 isn’t the same number as 324? Why is 123.4444... 
a real number, while . . . 444.321 is not? And why do we have to carry of digits when 
adding or multiplying? Why is 0.999 . . . the same number as 1? What is the smallest 
positive real number? Isn’t it just 0.00...001? So to set aside these problems, we 
will not try to assume any knowledge of the decimal system, though we will of course 
still refer to numbers by their familiar names such as 1, 2, and 3 instead of using 
other notation such as I, II, I or 0-4, (0+++)-H-, ((O+++-)-++)-++ (see below) so as 
not to be needlessly artificial. For completeness, we review the decimal system in 
Appendix B. 


2.1 The Peano Axioms 


We now present one standard way to define the natural numbers, in terms of the 
Peano axioms, which were first laid out by Giuseppe Peano (1858-1932). This is not 
the only way to define the natural numbers. For instance, another approach is to talk 
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about the cardinality of finite sets; for instance one could take a set of five elements 
and define 5 to be the number of elements in that set. We shall discuss this alternate 
approach in Sect. 3.6. However, we shall stick with the Peano axiomatic approach 
for now. 

How are we to define what the natural numbers are? Informally, we could say 


Definition 2.1.1 (nformal) A natural number is any element of the set 
N := {0, 1, 2,3,4,...}, 


which is the set of all the numbers created by starting with 0 and then counting 
forward indefinitely. We call N the set of natural numbers. 


Remark 2.1.2 In some texts the natural numbers start at 1 instead of 0, but this is a 
matter of notational convention more than anything else. In this text we shall refer 
to the set {1, 2, 3, ...} as the positive integers Z* rather than the natural numbers. 
Natural numbers are sometimes also known as whole numbers. 


In a sense, this definition solves the problem of what the natural numbers are: a 
natural number is any element of the set! N. However, it is not really that satisfactory, 
because it begs the question of what N is. This definition of “start at 0 and count 
indefinitely” seems like an intuitive enough definition of N, but it is not entirely 
acceptable, because it leaves many questions unanswered. For instance: how do we 
know we can keep counting indefinitely, without cycling back to 0? Also, how do 
you perform operations such as addition, multiplication, or exponentiation? 

We can answer the latter question first: we can define complicated operations in 
terms of simpler operations. Exponentiation is nothing more than repeated multi- 
plication: 5° is nothing more than three fives multiplied together. Multiplication is 
nothing more than repeated addition; 5 x 3 is nothing more than three fives added 
together. (Subtraction and division will not be covered here, because they are not 
operations which are well-suited to the natural numbers; they will have to wait for 
the integers and rationals, respectively.) And addition? It is nothing more than the 
repeated operation of counting forward, or incrementing. If you add three to five, 
what you are doing is incrementing five three times. On the other hand, incrementing 
seems to be a fundamental operation, not reducible to any simpler operation; indeed, 
it is the first operation one learns on numbers, even before learning to add. 

Thus, to define the natural numbers, we will use two fundamental concepts: the 
zero number O and the increment operation (also known as the successor opera- 
tion). In deference to modern computer languages, we will use n-++ to denote” the 


' Strictly speaking, there is another problem with this informal definition: we have not yet defined 
what a “set” is or what “element of” is. Thus for the rest of this chapter we shall avoid mention of 
sets and their elements as much as possible, except in informal discussion. 

? The notation Sn or S(n) is also often used in the literature to denote the successor n-H+- of n. One 
may be tempted to use the more familiar notation n + | in place of n+4+ to denote the successor 
of n, but this would introduce a circularity in our foundations, since the notion of addition will be 
defined in terms of the successor operation. 
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increment or successor of n, thus for instance 3+4+ = 4, (3-++)-++ =5, etc. This 
is a slightly different usage from that in computer languages such as C, where n++ 
actually redefines the value of n to be its successor; however in mathematics we try 
not to define a variable more than once in any given setting, as it can often lead to 
confusion; many of the statements which were true for the old value of the variable 
can now become false, and vice versa. 

So, it seems like we want to say that N consists of 0 and everything which can be 
obtained from 0 by incrementing: N should consist of the objects 


0, 0-H, (0++)-++, ((0++)-++)+, ete. 


If we start writing down what this means about the natural numbers, we thus see that 
we should have the following axioms concerning 0 and the increment operation -++: 


Axiom 2.1 0 is a natural number. 
Axiom 2.2 If is a natural number, then n-+++ is also a natural number. 


Thus for instance, from Axiom 2.1 and two applications of Axiom 2.2, we see that 
(O++)-+++ is a natural number. Of course, this notation will begin to get unwieldy, 
so we adopt a convention to write these numbers in more familiar notation: 


Definition 2.1.3 We define® 1 to be the number 0+++, 2 to be the number (0O+++)-+++, 
3 to be the number ((0-++)-++-)-H, etc. (In other words, | := 0-4, 2 := 144,3:= 
2+++, etc. In this text I use “x := y” to denote the statement that x is defined to 


equal y.) 
Thus for instance, we have 
Proposition 2.1.4 3 is a natural number. 


Proof By Axiom 2.1, 0 is a natural number. By Axiom 2.2, 0++ = 1 is a natural 
number. By Axiom 2.2 again, 1++ = 2 is a natural number. By Axiom 2.2 again, 
2++ = 3 is a natural number. 


It may seem that this is enough to describe the natural numbers. However, we 
have not pinned down completely the behavior of N: 


Example 2.1.5 Consider a number system which consists of the numbers 0, 1, 2, 3, 
in which the increment operation wraps back from 3 to 0. More precisely 0++ is 
equal to 1, 1++ is equal to 2, 24+ is equal to 3, but 3++ is equal to 0 (and also equal 
to 4, by definition of 4). This type of thing actually happens in real life, when one 
uses a computer to try to store a natural number: if one starts at 0 and performs the 
increment operation repeatedly, eventually the computer will overflow its memory 
and the number will wrap around back to 0 (though this may take quite a large number 


3 This convention is actually an oversimplification. To see how to properly merge the usual decimal 
notation for numbers with the natural numbers given by the Peano axioms, see Appendix B. 
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of incrementation operations, for instance a two-byte representation of an integer will 
wrap around only after 65,536 increments). Note that this type of number system 
obeys Axiom 2.1 and Axiom 2.2, even though it clearly does not correspond to what 
we intuitively believe the natural numbers to be like. 


To prevent this sort of “wrap-around issue” we will impose another axiom: 


Axiom 2.3 0 is not the successor of any natural number; i.e., we have n++ 4 0 for 
every natural number n. 


Now we can show that certain types of wrap around do not occur: for instance we 
can now rule out the type of behavior in Example 2.1.5 using. 


Proposition 2.1.6 4 is not equal to 0. 


Don’t laugh! Because of the way we have defined 4—it is the increment of the 
increment of the increment of the increment of 0—it is not necessarily true a priori 
that this number is not the same as zero, even if it is “obvious”. (“a priori” is Latin 
for “beforehand”’—it refers to what one already knows or assumes to be true before 
one begins a proof or argument. The opposite is “a posteriori” —what one knows to 
be true after the proof or argument is concluded.) Note for instance that in Example 
2.1.5, 4 was indeed equal to 0, and that in a standard two-byte computer representation 
of a natural number, for instance, 65,536 is equal to 0 (using our definition of 65,536 
as equal to 0 incremented sixty-five thousand, five hundred and thirty-six times). 


Proof By definition, 4 = 3++. By Axioms 2.1 and 2.2, 3 is a natural number. Thus 
by Axiom 2.3, 3++ # 0, ie.,4 4 0. 


However, even with our new axiom, it is still possible that our number system 
behaves in other pathological ways: 


Example 2.1.7 Consider a number system consisting of five numbers 0, 1, 2, 3, 
4, in which the increment operation hits a “ceiling” at 4. More precisely, suppose 
that 0-4 = 1, 144 = 2, 2+4 = 3, 3+4 =4, but 444 = 4 (or in other words that 
5 = 4, and hence 6 = 4, 7 = 4, etc.). This does not contradict Axioms 2.1, 2.2 and 
2.3. Another number system with a similar problem is one in which incrementation 
wraps around, but not to zero, e.g., suppose that 4++ = | (so that 5 = 1, then 6 = 2, 
etc.). 


There are many ways to prohibit the above types of behavior from happening, but 
one of the simplest is to assume the following axiom: 


Axiom 2.4 Different natural numbers must have different successors; i.e., if n, m 
are natural numbers andn ¢ m, thenn++ ¢ m++. Equivalently,* if n4+4 = m++ 
then we must have n = m. 


4 This is an example of reformulating an implication using its contrapositive; see Sect. A.2 for more 
details. In the converse direction, if m = m, thenn++ = m-++; this is the axiom of substitution (see 
Sect. A.7) applied to the operation +++. 
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Thus, for instance, we have 
Proposition 2.1.8 6 is not equal to 2. 


Proof Suppose for sake of contradiction that 6 = 2. Then5++ = 1++, so by Axiom 
2.4 we have 5 = 1, so that 4++ = 0++. By Axiom 2.4 again we then have 4 = 0, 
which contradicts our previous proposition. 


As one can see from this proposition, it now looks like we can keep all of the natural 
numbers distinct from each other. There is however still one more problem: while the 
axioms (particularly Axioms 2.3 and 2.4) allow us to confirm that 0, 1,2, 3,... are 
distinct elements of N, there is the problem that there may be other “rogue” elements 
in our number system which are not of this form: 


Example 2.1.9 (Informal) Suppose that our number system N consisted of the fol- 
lowing collection of integers and half-integers: 


N := {0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, ...}. 


(This example is marked “informal” since we are using real numbers, which we’re 
not supposed to use yet.) One can check that Axioms 2.1—2.4 are still satisfied for 
this set. 


What we want is some axiom which says that the only numbers in N are those 
which can be obtained from 0 and the increment operation—in order to exclude 
elements such as 0.5. But it is difficult to quantify what we mean by “can be obtained 
from” without already using the natural numbers, which we are trying to define. 
Fortunately, there is an ingenious solution to try to capture this fact: 


Axiom 2.5 (Principle of mathematical induction). Let P(n) be any property per- 
taining to a natural number n. Suppose that P (0) is true, and suppose that whenever 
P(n) is true, P(n++) is also true. Then P (7) is true for every natural number n. 


Remark 2.1.10 We are a little vague on what “property” means at this point, but 
some possible examples of P (n) might be “n is even”; “n is equal to 3”; “n solves the 
equation (n + 1)? = n? + 2n + 1”; and so forth. Of course we haven’t defined many 
of these concepts yet, but when we do, Axiom 2.5 will apply to these properties. (A 
logical remark: Because this axiom refers not just to variables, but also properties, it is 
of a different nature than the other four axioms; indeed, Axiom 2.5 should technically 
be called an axiom schema rather than an axi—it is a template for producing an 
(infinite) number of axioms, rather than being a single axiom in its own right. To 
discuss this distinction further is far beyond the scope of this text, though, and falls 
in the realm of mathematical logic.) 


The informal intuition behind this axiom is the following. Suppose P(n) is such 
that P(O) is true, and such that whenever P(n) is true, then P(n++) is true. Then 
since P(Q) is true, P(O-++) = P(1) is true. Since P(1) is true, P(1++) = P(2) 
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is true. Repeating this indefinitely, we see that P(0), P(1), P(2), P(3), etc., are 
all true—however this line of reasoning will never let us conclude that P(0.5), for 
instance, is true. Thus Axiom 2.5 should not hold for number systems which contain 
“unnecessary” elements such as 0.5. (Indeed, one can give a “proof” of this fact as 
follows. Apply Axiom 2.5 to the property P(n) = n “is not a half-integer’, i.e., an 
integer plus 0.5. Then P(0) is true, and if P(7) is true, then P(n+++) is true. Thus 
Axiom 2.5 asserts that P(n) is true for all natural numbers n, 1.e., no natural number 
can be a half-integer. In particular, 0.5 cannot be a natural number. This “proof” 
is not quite genuine, because we have not defined such notions as “integer”, “half- 
integer”, and “0.5” yet, but it should give you some idea as to how the principle of 
induction is supposed to prohibit any numbers other than the “true” natural numbers 
from appearing in N.) 

The principle of induction gives us a way to prove that a property P() is true for 
every natural number n. Thus in the rest of this text we will see many proofs which 
have a form like this: 


Proposition Template 2.1.11 A certain property P(n) is true for every natural num- 
bern. 


Proof Template We use induction. We first verify the base case n = 0, i.e., we prove 
PO). Unsert proof of P (0) here.) Now suppose inductively that n is a natural number, 
and P(n) has already been proven. We now prove P(n-++). (Insert proof of P(n-+++), 
assuming that P(n) is true, here.) This closes the induction, and thus P(n) is true 
for all numbers n. 


Of course we will not necessarily use the exact template, wording, or order in the 
above type of proof, but the proofs using induction will generally be something 
like the above form. There are also some other variants of induction which we 
shall encounter later, such as backwards induction (Exercise 2.2.6), strong induction 
(Proposition 2.2.14), and transfinite induction (Lemma 8.5.15). 

Axioms 2.1—2.5 are known as the Peano axioms for the natural numbers. They 
are all very plausible, and so we shall make 


Assumption 2.6 (Informal) There exists a number system N, whose elements we 
will call natural numbers, for which Axioms 2.1—2.5 are true. 


We will make this assumption a bit more precise once we have laid down our 
notation for sets and functions in the next chapter. 


Remark 2.1.12 We will refer to this number system N as the natural number system. 
One could of course consider the possibility that there is more than one natural number 
system, e.g., we could have the Hindu-Arabic number system {0, 1, 2,3, ...} and 
the Roman number system {O, 7, J], 11, 1V, V, VI, ...} (augmented by adding a 
zero symbol O), and if we really wanted to be annoying we could view these number 
systems as different. But these number systems are clearly equivalent (the technical 
term is isomorphic), because one can create a one-to-one correspondence 0 <+ O, 
1< 1,2 < IT, etc., which maps the zero of the Hindu-Arabic system with the zero 
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of the Roman system, and which is preserved by the increment operation (e.g., if 2 
corresponds to JJ, then 2++ will correspond to J 7++-). For a more precise statement 
of this type of equivalence, see Exercise 3.5.13. Since all versions of the natural 
number system are equivalent, there is no point in having distinct natural number 
systems, and we will just use a single natural number system to do mathematics. 


We will not prove Assumption 2.6 (though we will eventually include it in our 
axioms for set theory, see Axiom 3.8), and it will be the only assumption we will ever 
make about our numbers. A remarkable accomplishment of modern analysis is that 
just by starting from these five very primitive axioms, and some additional axioms 
from set theory, we can build all the other number systems, create functions, and do 
all the algebra and calculus that we are used to. 


Remark 2.1.13 (Informal) One interesting feature about the natural numbers is that 
while each individual natural number is finite, the set of natural numbers is infinite; 
i.e., N is infinite but consists of individually finite elements. (The whole is greater 
than any of its parts.) There are no infinite natural numbers; one can even prove this 
using Axiom 2.5, provided one is comfortable with the notions of finite and infinite. 
(Clearly 0 is finite. Also, ifn is finite, then clearly n-++ is also finite. Hence by Axiom 
2.5, all natural numbers are finite.) So the natural numbers can approach infinity, but 
never actually reach it; infinity is not one of the natural numbers. (There are other 
number systems which admit “infinite” numbers, such as the cardinals, ordinals, and 
p-adics, but they do not obey the principle of induction, and in any event are beyond 
the scope of this text.) 


Remark 2.1.14 Note that our definition of the natural numbers is axiomatic rather 
than constructive. We have not told you what the natural numbers are (so we do not 
address such questions as what the numbers are made of, are they physical objects, 
what do they measure, etc.)—we have only listed some things you can do with them 
(in fact, the only operation we have defined on them right now is the increment 
one) and some of the properties that they have. This is how mathematics works—it 
treats its objects abstractly, caring only about what properties the objects have, not 
what the objects are or what they mean. If one wants to do mathematics, it does 
not matter whether a natural number means a certain arrangement of beads on an 
abacus, or a certain organization of bits in a computer’s memory, or some more 
abstract concept with no physical substance; as long as you can increment them, see 
if two of them are equal, and later on do other arithmetic operations such as add and 
multiply, they qualify as numbers for mathematical purposes (provided they obey 
the requisite axioms, of course). It is possible to construct the natural numbers from 
other mathematical objects—from sets, for instance—but there are multiple ways to 
construct a working model of the natural numbers, and it is pointless, at least from 
a mathematician’s standpoint, as to argue about which model is the “true” one; as 
long as it obeys all the axioms and does all the right things, that’s good enough to do 
maths. 


Remark 2.1.15 Historically, the realization that numbers could be treated axiomat- 
ically is very recent, not much more than a hundred years old. Before then, numbers 
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were generally understood to be inextricably connected to some external concept, 
such as counting the cardinality of a set, measuring the length of a line segment, or the 
mass of a physical object. This worked reasonably well, until one was forced to move 
from one number system to another; for instance, understanding numbers in terms 
of counting beads, for instance, is great for conceptualizing the numbers 3 and 5, but 
doesn’t work so well for —3 or 1/3 or V2 or 3 + 4i; thus each great advance in the 
theory of numbers—negative numbers, irrational numbers, complex numbers, even 
the number zero—led to a lot of unnecessary philosophical anguish. The great dis- 
covery of the late nineteenth century was that numbers can be understood abstractly 
via axioms, without necessarily needing a concrete model; of course a mathematician 
can use any of these models when it is convenient, to aid his or her intuition and under- 
standing, but they can also be just as easily discarded when they begin to get in the 
way. 


One consequence of the axioms is that we can now define sequences recursively. 
Suppose we want to build a sequence ag, a), a2, ... of numbers by first defining ap 
to be some base value, e.g., ado := c for some number c, and then by letting a; be 
some function of do, a; := fo(do), a2 be some function of a), az := fi (a1), and so 
forth. In general, we set a,4, := f,(a,) for some function f, from N to N. By using 
all the axioms together we will now conclude that this procedure will give a single 
value to the sequence element a, for each natural number n. More precisely”: 


Proposition 2.1.16 (Recursive definitions). Suppose for each natural number n, we 
have some function f, : N — N from the natural numbers to the natural numbers. 
Let c be a natural number. Then we can assign a unique natural number a, to each 
natural number n, such that ay = c and ayi4 = fy(Gn) for each natural number n. 


Proof (Informal) We use induction. We first observe that this procedure gives a 
single value to aj, namely c. (None of the other definitions a,44, := fir (a,) will 
redefine the value of aj, because of Axiom 2.3.) Now suppose inductively that 
the procedure gives a single value to a,. Then it gives a single value to ayj44, 
namely a,4, := f,(a,). (None of the other definitions a,,4, := fin(Gn) will rede- 
fine the value of a,4,, because of Axiom 2.4.) This completes the induction, 
and so a, is defined for each natural number n, with a single value assigned to 
each ay. 


Note how all of the axioms had to be used here. In a system which had some sort 
of wrap-around, recursive definitions would not work because some elements of the 
sequence would constantly be redefined. For instance, in Example 2.1.5, in which 
3++ = 0, then there would be (at least) two conflicting definitions for ao, either c 
or 3(a3). In a system which had superfluous elements such as 0.5, the element ao.5 
would never be defined. 


5 Strictly speaking, this proposition requires one to define the notion of a function, which we shall 
do in the next chapter. However, this will not be circular, as the concept of a function does not 
require the Peano axioms. Proposition 2.1.16 can be formalized more rigorously in the language of 
set theory; see Exercise 3.5.12. 
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Recursive definitions are very powerful; for instance, we can use them to define 
addition and multiplication, to which we now turn. 


2.2 Addition 


The natural number system is very bare right now: we have only one operation— 
incrementation—and a handful of axioms. But now we can build up more complex 
operations, such as addition. 

The way it works is the following. To add three to five should be the same as 
incrementing five three times—this is one increment more than adding two to five, 
which is one increment more than adding one to five, which is one increment more 
than adding zero to five, which should just give five. So we give a recursive definition 
for addition as follows. 


Definition 2.2.1 (Addition of natural numbers). Let m be a natural number. To add 
zero tom, we define 0 + m := m. Now suppose inductively that we have defined how 
to add n to m. Then we can add n++ to m by defining (n+++-) +m := (n+m)+4+. 


Thus 0+ mism,1+m = (0++) + mism+4+;2+m= (1H) +m = (m+) 
+4; and so forth; for instance we have 2+ 3 = (3-++4+-)-+4 = 4+4+ =5. From our 
discussion of recursion in the previous section we see that we have defined n + m 
for every natural number 7. Here we are specializing the previous general discussion 
to the setting where a, =n +m and f,(a;) = ant+. Note that this definition is 
asymmetric: 3+ 5 is incrementing 5 three times, while 5+ 3 is incrementing 3 
five times. Of course, they both yield the same value of 8. More generally, it is a 
fact (which we shall prove shortly) that a + b = b + a for all natural numbers a, b, 
although this is not immediately clear from the definition. 

Notice that we can prove easily, using Axioms 2.1, 2.2, and induction (Axiom 
2.5), that the sum of two natural numbers is again a natural number (why?). 

Right now we only have two facts about addition: that 0 + m = m, and that 
(n++) +m = (n+ m)++. Remarkably, this turns out to be enough to deduce every- 
thing else we know about addition. We begin with some basic lemmas.° 


Lemma 2.2.2 For any natural number n,n+0=n. 


Note that we cannot deduce this immediately from 0 + m = m because we have 
not yet established the commutative property a + b = b + a of addition. 


© From a logical point of view, there is no difference between a lemma, proposition, theorem, 
or corollary—they are all claims waiting to be proved. However, we use these terms to suggest 
different levels of importance and difficulty. A lemma is an easily proved claim which is helpful 
for proving other propositions and theorems, but is usually not particularly interesting in its own 
right. A proposition is a statement which is interesting in its own right, while a theorem is a more 
important statement than a proposition which says something definitive on the subject, and often 
takes more effort to prove than a proposition or lemma. A corollary is a quick consequence of a 
proposition or theorem that was proven recently. 
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Proof We use induction. The base case 0 + 0 = 0 follows since we know that 
0 +m =m for every natural number m, and 0 is a natural number. Now suppose 
inductively that n + 0 =n. We wish to show that (n++) + 0 =n-++. But by defi- 
nition of addition, (7-+++) + 0 is equal to (n + 0)-++, which is equal to n+4 since 
n+0 =n. This closes the induction. 


Lemma 2.2.3 For any natural numbers n and m,n + (m-+++) = (n+ m)-+. 


Again, we cannot deduce this yet from (n++) + m = (n + m)-++ because we do 
not know yet thata+b=b+a. 


Proof We induct on n (keeping m fixed). We first consider the base case n = 0. In 
this case we have to prove 0 + (m++) = (0+ m)-++. But by definition of addition, 
0+ (m++) = m+ and 0 +m =m, so both sides are equal to m++ and are thus 
equal to each other. Now we assume inductively that n + (m+++-) = (n + m)-+++; we 
now have to show that (n++-) + (m++) = ((n++) + m)+. The left-hand side is 
(n + (m++))-+++ by definition of addition, which is equal to ((n + m)+++)-++ by the 
inductive hypothesis. Similarly, we have (n-+++) + m = (n + m)-++ by the definition 
of addition, and so the right-hand side is also equal to ((n + m)-+++)-++. Thus both 
sides are equal to each other, and we have closed the induction. 


As a particular corollary of Lemma 2.2.2 and Lemma 2.2.3 we see that n++ = 
n + | (why?). 
As promised earlier, we can now prove thata +b=b+a. 


Proposition 2.2.4 (Addition is commutative). For any natural numbers n and m, 
n+m=m-+n. 


Proof We shall use induction on n (keeping m fixed). First we do the base casen = 0, 
i.e., we show 0+ m =m -+ 0. By the definition of addition, 0 + m = m, while by 
Lemma 2.2.2, m+ 0= m. Thus the base case is done. Now suppose inductively 
that n + m = m +n, now we have to prove that (n++) + m = m + (n++) to close 
the induction. By the definition of addition, (n+++-) +m = (n+m)-+++-. By Lemma 
2.2.3, m + (n++) = (m +n)++, but this is equal to (1 + m)-++ by the inductive 
hypothesis n + m = m +n. Thus (n-+++) +m = m + (n-+++) and we have closed the 
induction. 


Proposition 2.2.5 (Addition is associative). For any natural numbers a, b,c, we 
have (a+b)+c=a+(b+c). 


Proof See Exercise 2.2.1. 


Because of this associativity we can write sums such as a + b + c without having 
to worry about which order the numbers are being added together. 
Now we develop a cancellation law. 


Proposition 2.2.6 (Cancellation law). Let a, b, c be natural numbers such that a + 
b=a-+c. Then we have b= c. 
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Note that we cannot use subtraction or negative numbers yet to prove this propo- 
sition, because we have not developed these concepts yet. In fact, this cancellation 
law is crucial in letting us define subtraction (and the integers) later on in this text, 
because it allows for a sort of “virtual subtraction” even before subtraction is officially 
defined. 


Proof We prove this by induction on a. First consider the base case a = 0. Then we 
have 0+ b=0-+c, which by definition of addition implies that b = c as desired. 
Now suppose inductively that we have the cancellation law for a (so that a+ b= 
a-+c implies b = c); we now have to prove the cancellation law for a++. In other 
words, we assume that (a++) + b = (a++) + c and need to show that b = c. By 
the definition of addition, (a++) +b = (a+ b)++ and (a++) +c = (a+ c)H 
and so we have (a+ b)4+4+ = (a+c)+. By Axiom 2.4, we havea+b=a+c. 
Since we already have the cancellation law for a, we thus have b = c as desired. This 
closes the induction. 


We now discuss how addition interacts with positivity. 


Definition 2.2.7 (Positive natural numbers). A natural number n is said to be positive 
iff it is not equal to 0. (“iff” is shorthand for “if and only if”; see Sect. A.1.) 


Proposition 2.2.8 [fa is a positive natural number, and b is a natural number, then 
a + b is positive (and hence b + a is also, by Proposition 2.2.4). 


Proof We use induction on b. If b = 0, thena + b = a+0 =a, which is positive, 
so this proves the base case. Now suppose inductively that a + b is positive. Then 
a+ (b++) = (a+ b)-+, which cannot be zero by Axiom 2.3, and is hence positive. 
This closes the induction. 


Corollary 2.2.9 If a and b are natural numbers such that a+ b = 0, then a = 0 
and b = 0. 


Proof Suppose for sake of contradiction thata 4 Oorb 4 0. Ifa 4 0 then a is posi- 
tive, and hence a + b = O is positive by Proposition 2.2.8, a contradiction. Similarly 
if b £0 then b is positive, and again a + b = 0 is positive by Proposition 2.2.8, a 
contradiction. Thus a and b must both be zero. 


Lemma 2.2.10 Let a be a positive number. Then there exists exactly one natural 
number b such that b++ = a. 


Proof See Exercise 2.2.2. 


Once we have a notion of addition, we can begin defining a notion of order. 


Definition 2.2.11 (Ordering of the natural numbers) Let n and m be natural num- 
bers. We say that n is greater than or equal to m, and write n > m orm <n, iff we 
have n = m + a for some natural number a. We say that n is strictly greater than m, 
and writen > morm <n,iffn >mandn ~¢m. 
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Thus for instance 8 > 5, because 8 = 5+ 3 and 8 * 5. Also note that n+4+ >n 
for any n; thus there is no largest natural number 1, because the next number n-++- 
is always larger still. 


Proposition 2.2.12 (Basic properties of order for natural numbers). Let a, b, c be 
natural numbers. Then 


(a) (Order is reflexive) a > a. 

(b) (Order is transitive) Ifa > b andb > c, thena > c. 

(c) (Order is antisymmetric) Ifa > b and b > a, thena = b. 

(d) (Addition preserves order) a > b if and only ifa+c>b+t+e. 
(e) a < bifand only ifa+ < b. 

(f) a < bifand only ifb =a+d for some positive number d. 


Proof See Exercise 2.2.3. 


Proposition 2.2.13 (Trichotomy of order for natural numbers). Let a and b be nat- 
ural numbers. Then exactly one of the following statements is true: a < b, a = b, or 
a> b. 


Proof This is only a sketch of the proof; the gaps will be filled in Exercise 2.2.4. 

First we show that we cannot have more than one of the statements a < b,a = b, 
a > bholding at the same time. If a < b thena # b by definition, and if a > b then 
a # b by definition. If a > b anda < b then by Proposition 2.2.12 we have a = b, 
a contradiction. Thus no more than one of the statements is true. 

Now we show that at least one of the statements is true. We keep b fixed and 
induct on a. When a = 0 we have 0 < b for all b (why?), so we have either 0 = b 
or 0 < b, which proves the base case. Now suppose we have proven the proposition 
for a, and now we prove the proposition for a++-. From the trichotomy for a, there 
are three cases:a < b,a =b,anda > b. Ifa > b, thena++ > b (why?). Ifa = b, 
then a++ > b (why?). Now suppose that a < b. Then by Proposition 2.2.12, we 
have a++ < b. Thus either a++ = b or a++ < dD, and in either case we are done. 
This closes the induction. 


The properties of order allow one to obtain a stronger version of the principle of 
induction: 


Proposition 2.2.14 (Strong principle of induction). Let mo be a natural number, and 
let P(m) be a property pertaining to an arbitrary natural number m. Suppose that 
for each m > mo, we have the following implication: if P (m') is true for all natural 
numbers mg < m' < m, then P(m) is also true. (In particular, this means that P (mo) 
is true, since in this case the hypothesis is vacuous.) Then we can conclude that P (m) 
is true for all natural numbers m > mo. 


Remark 2.2.15 In applications we usually use this principle with my = 0 orm = 1. 


Proof See Exercise 2.2.5. 
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— Exercises — 
Exercise 2.2.1 Prove Proposition 2.2.5. (Hint: fix two of the variables and induct on the third.) 


Exercise 2.2.2 Prove Lemma 2.2.10. (Hint: use induction. The induction here is somewhat degen- 
erate, in that the induction hypothesis is not actually used, but this does not prevent the argument 
from remaining valid; cf. the discussion on implication and causality in Appendix A.2.) 


Exercise 2.2.3 Prove Proposition 2.2.12. (Hint: you will need many of the preceding propositions, 
corollaries, and lemmas.) 


Exercise 2.2.4 Justify the three statements marked (why?) in the proof of Proposition 2.2.13. 


Exercise 2.2.5 Prove Proposition 2.2.14. (Hint: define Q(n) to be the property that P(m) is true 
for all mp < m <n; note that Q(n) is vacuously true when n < mo.) 


Exercise 2.2.6 Let n be a natural number, and let P(m) be a property pertaining to the natural 
numbers such that whenever P(m-++) is true, then P(m) is true. Suppose that P(n) is also true. 
Prove that P(m) is true for all natural numbers m < n; this is known as the principle of backwards 
induction. (Hint: apply induction to the variable n.) 


Exercise 2.2.7. Let n be a natural number, and let P(m) be a property pertaining to the natural 
numbers such that whenever P(m) is true, P(m-+++) is true. Show that if P(n) is true, then P(m) 
is true for all m > n. (This principle is sometimes referred to as the principle of induction starting 
from the base case n.) 


2.3 Multiplication 


In the previous section we have proven all the basic facts that we know to be true 
about addition and order. To save space and to avoid belaboring the obvious, we 
will now allow ourselves to use all the rules of algebra concerning addition and 
order that we are familiar with, without further comment. Thus for instance we may 
write things likea +b+c=c+b+a without supplying any further justification. 
Now we introduce multiplication. Just as addition is the iterated increment operation, 
multiplication is iterated addition: 


Definition 2.3.1 (Multiplication of natural numbers). Let m be a natural number. 
To multiply zero to m, we define 0 x m := 0. Now suppose inductively that we 
have defined how to multiply n to m. Then we can multiply n+4 to m by defining 
(nH) x m:=(n xX m)+m. 


Thus for instance 0x m=0, 1xm=0+m,2xm=0+m-+m, etc. By 
induction one can easily verify that the product of two natural numbers is a nat- 
ural number. 


Lemma 2.3.2. (Multiplication is commutative). Let n, m be natural numbers. Then 
nxm=mxn. 


Proof See Exercise 2.3.1. 
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We will now abbreviate n x mas nm and use the usual convention that multiplica- 
tion takes precedence over addition, thus for instance ab + c means (a x b) + c, not 
a x (b+ c). (We will also use the usual notational conventions of precedence for the 
other arithmetic operations when they are defined later, to save on using parentheses 
all the time.) 


Lemma 2.3.3 (Positive natural numbers have no zero divisors). Let n, m be natural 
numbers. Then n x m = 0 if and only if at least one of n,m is equal to zero. In 
particular, ifn and m are both positive, then nm is also positive. 


Proof See Exercise 2.3.2. 


Proposition 2.3.4 (Distributive law). For any natural numbers a,b,c, we have 
a(b+c) =ab+acand (b+ c)a=ba+ca. 


Proof Since multiplication is commutative we only need to show the first identity 
a(b+c) =ab-+ac. We keep a and b fixed, and use induction on c. Let’s prove 
the base case c = 0, Le., a(b + 0) = ab + a0. The left-hand side is ab, while the 
right-hand side is ab + 0 = ab, so we are done with the base case. Now let us 
suppose inductively that a(b + c) = ab + ac, and let us prove that a(b + (c+++)) = 
ab + a(c++). The left-hand side is a((b + c)-+H+-) = a(b + c) +, while the right- 
hand side is ab + ac+a=a(b+c)+a by the induction hypothesis, and so we 
can close the induction. 


Proposition 2.3.5 (Multiplication is associative). For any natural numbers a, b,c, 
we have (a x b) x c=ax (bx). 


Proof See Exercise 2.3.3. 


Proposition 2.3.6 (Multiplication preserves order). Ifa, b are natural numbers such 
that a < b, and c is positive, then ac < be. 


Proof Since a < b, we have b = a +d for some positive d. Multiplying by c and 
using the distributive law we obtain bc = ac + dc. Since d is positive, and c is 
positive, dc is positive, and hence ac < bce as desired. 


Corollary 2.3.7 (Cancellation law). Let a, b, c be natural numbers such that ac = 
bc and c is non-zero. Then a = b. 


Remark 2.3.8 Just as Proposition 2.2.6 will allow for a “virtual subtraction” which 
will eventually let us define genuine subtraction, this corollary provides a “virtual 
division” which will be needed to define genuine division later on. 


Proof By the trichotomy of order (Proposition 2.2.13), we have three cases: a < b, 
a =b,a > b. Suppose first that a < b, then by Proposition 2.3.6 we have ac < bc, 
a contradiction. We can obtain a similar contradiction when a > b. Thus the only 
possibility is that a = b, as desired. 
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With these propositions it is easy to deduce all the familiar rules of algebra involv- 
ing addition and multiplication, see for instance Exercise 2.3.4. 

Now that we have the familiar operations of addition and multiplication, the more 
primitive notion of increment will begin to fall by the wayside, and we will see it rarely 
from now on. In any event we can always use addition to describe incrementation, 
sincen++ =n+ 1. 


Proposition 2.3.9 (Euclid’s division lemma). Let n be a natural number, and let q 
be a positive number. Then there exist natural numbers m, r such thatO0 <r < q and 
n=mq-+r. 


Remark 2.3.10 In other words, we can divide a natural number n by a positive 
number g to obtain a quotient m (which is another natural number) and a remainder 
r (which is less than qg). This algorithm marks the beginning of number theory, which 
is a beautiful and important subject but one which is beyond the scope of this text. 


Proof See Exercise 2.3.5. 


Just like one uses the increment operation to recursively define addition, and 
addition to recursively define multiplication, one can use multiplication to recursively 
define exponentiation: 


Definition 2.3.11 (Exponentiation for natural numbers). Let m be a natural number. 
To raise m to the power 0, we define m® := 1;in particular, we define 0° := 1. Now 
suppose recursively that m” has been defined for some natural number n, then we 
define m"t* := m" x m. 


Examples 2.3.12 Thus forinstancex! = x° x x =1xx=x;x7 =x! xx=xx 


xix? =x? xx =x x x x x; and so forth. By induction we see that this recursive 


definition defines x” for all natural numbers n. 


We will not develop the theory of exponentiation too deeply here, but instead 
wait until after we have defined the integers and rational numbers; see in particular 
Proposition 4.3.10. 


— Exercises — 


Exercise 2.3.1 Prove Lemma 2.3.2. (Hint: modify the proofs of Lemmas 2.2.2, 2.2.3 and Proposi- 
tion 2.2.4.) 


Exercise 2.3.2 Prove Lemma 2.3.3. (Hint: prove the second statement first.) 


Exercise 2.3.3 Prove Proposition 2.3.5. (Hint: modify the proof of Proposition 2.2.5 and use the 
distributive law.) 


Exercise 2.3.4 Prove the identity (a + b)* = a? + 2ab + b? for all natural numbers a, b. 


Exercise 2.3.5 Prove Proposition 2.3.9. (Hint: fix q and induct on n.) 


Chapter 3 Mm) 
Set Theory crest 


Modern analysis, like most other subfields of modern mathematics, is concerned 
with numbers, sets, and geometry. We have already introduced one type of number 
system, the natural numbers. We will introduce the other number systems shortly, 
but for now we pause to introduce the concepts and notation of set theory, as they 
will be used increasingly heavily in later chapters. (We will not pursue a rigorous 
description of Euclidean geometry in this text, preferring instead to describe that 
geometry in terms of the real number system by means of the Cartesian co-ordinate 
system.) 

While set theory is not the main focus of this text, almost every other branch of 
mathematics relies on set theory as part of its foundation, so it is important to get at 
least some grounding in set theory before doing other advanced areas of mathematics. 
In this chapter we present the more elementary aspects of axiomatic set theory, leaving 
more advanced topics such as a discussion of infinite sets and the axiom of choice 
to Chap. 8. A full treatment of the finer subtleties of set theory (of which there are 
many!) is unfortunately well beyond the scope of this text. 


3.1 Fundamentals 


In this section we shall set out some axioms for sets, just as we did for the natural num- 
bers. For pedagogical reasons, we will use a somewhat overcomplete list of axioms for 
set theory, in the sense that some of the axioms can be used to deduce others, but there 
is noreal harm in doing this. We begin with an informal description of what sets should 
be. 


Definition 3.1.1 (/nformal) We define a set A to be any unordered collection of 
objects, e.g., {3, 8,5, 2} is a set. If x is an object, we say that x is an element of 
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A or x € A if x lies in the collection; otherwise we say that x ¢ A. For instance, 
3 € {1,2, 3,4, 5} but 7 ¢ {1, 2, 3, 4, 5}. 


This definition is intuitive enough, but it doesn’t answer a number of questions, 
such as which collections of objects are considered to be sets, which sets are equal to 
other sets, and how one defines operations on sets (e.g., unions, intersections, etc.). 
Also, we have no axioms yet on what sets do, or what their elements do. Obtaining 
these axioms and defining these operations will be the purpose of the remainder of 
this section. 

We first clarify one point: we consider sets themselves to be a type of object. 


Axiom 3.1 (Sets are objects). If A is a set, then A is also an object. In particular, 
given two sets A and B, it is meaningful to ask whether A is also an element of B. 


Example 3.1.2 (Informal) The set {3, {3, 4}, 4} is a set of three distinct elements, 
one of which happens to itself be a set of two elements. See Example 3.1.9 for a 
more formal version of this example. 


Remark 3.1.3. There is a special case of set theory, called “pure set theory”, in which 
all objects are sets; for instance the number 0 might be identified with the empty 
set @ = {}, the number | might be identified with {O} = {{}}, the number 2 might be 
identified with {0, 1} = {{}, {{}}}, and so forth. From a logical point of view, pure set 
theory is a simpler theory, since one only has to deal with sets and not with objects; 
however, from a conceptual point of view it is often easier to deal with impure set 
theories in which some objects are not considered to be sets. The two types of theories 
are more or less equivalent for the purposes of doing mathematics, and so we shall 
take an agnostic position as to whether all objects are sets or not. For instance, we do 
not insist that a natural number such as 3 be identified with a set as indicated above. 
(The more accurate and mathematically useful statement is that natural numbers 
can be the cardinalities of sets, rather than necessarily being sets themselves. See 
Sect. 3.6.) 


To summarize so far, among all the objects studied in mathematics, some of the 
objects happen to be sets; and if x is an object and A is a set, then either x € A is 
true or x € A is false. (If A is not a set, we leave the statement x € A undefined; 
for instance, we consider the statement 3 € 4 to neither be true or false, but simply 
meaningless, since 4 is not a set.) 

Next, we try to capture the notion of equality: when are two sets considered to be 
equal? We do not consider the order of the elements inside a set to be important; thus 
we think of {3, 8, 5, 2} and {2, 3, 5, 8} as the same set. On the other hand, {3, 8, 5, 2} 
and {3, 8,5, 2, 1} are different sets, because the latter set contains an element that 
the former one does not, namely the element 1. For similar reasons {3, 8, 5, 2} and 
{3, 8, 5} are different sets. We formalize this by a further axiom: 


Axiom 3.2 (Equality of sets). Two sets A and B are equal, A = B, iff every element 
of A is an element of B and vice versa. To put it another way, A = B if and only if 
every element x of A belongs also to B, and every element y of B belongs also to A. 
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Example 3.1.4 Thus, for instance, {1, 2, 3, 4, 5} and {3, 4, 2, 1, 5} are the same set, 
since they contain exactly the same elements. (The set {3, 3, 1,5, 2,4, 2} is also 
equal to {1, 2,3, 4, 5}; the repetition of 3 and 2 is irrelevant as it does not further 
change the status of 2 and 3 being elements of the set.) 


The “is an element of” relation € obeys the axiom of substitution (see Section 
A.7). Because of this, any new operation we define on sets will also obey the axiom 
of substitution, as long as we can define that operation purely in terms of the relation 
€. This is for instance the case for the remaining definitions in this section. (On the 
other hand, we cannot use the notion of the “first” or “last” element in a set in a 
well-defined manner, because this would not respect the axiom of substitution; for 
instance the sets {1, 2, 3, 4, 5} and {3, 4, 2, 1, 5} are the same set, but have different 
first elements.) 

Next, we turn to the issue of exactly which objects are sets and which objects 
are not. The situation is analogous to how we defined the natural numbers in the 
previous chapter; we started with a single natural number, 0, and started building 
more numbers out of 0 using the increment operation. We will try something similar 
here, starting with a single set, the empty set, and building more sets out of the empty 
set by various operations. We begin by postulating the existence of the empty set. 


Axiom 3.3 (Empty set). There exists a set J, known as the empty set, which contains 
no elements, i.e., for every object x we have x ¢ ©. 


The empty set is also denoted {}. Note that there can only be one empty set; if 
there were two sets 4 and @ which were both empty, then by Axiom 3.2 they would 
be equal to each other (why?). 

Ifa set is not equal to the empty set, we call it non-empty. The following statement 
is very simple, but worth stating nevertheless: 


Lemma 3.1.5 (Single choice). Let A be a non-empty set. Then there exists an object 
x such that x € A. 


Proof We prove by contradiction. Suppose there does not exist any object x such 
that x € A. Then for all objects x, we have x ¢ A. Also, by Axiom 3.3 we have 
x €0.Thusx e€ A <> x € GY (both statements are equally false), and so A = J 
by Axiom 3.2, a contradiction. 


Remark 3.1.6 The above Lemma asserts that given any non-empty set A, we are 
allowed to “choose” an element x of A which demonstrates this non-emptyness. 
Later on (in Lemma 3.5.11) we will show that given any finite number of non-empty 
sets, say Aj,..., An, it is possible to choose one element x,,..., x, from each set 
Aj,..., An; this is known as “finite choice”. However, in order to choose elements 
from an infinite number of sets, we need an additional axiom, the axiom of choice, 
which we will discuss in Sect. 8.4. 


Remark 3.1.7 Note that the empty set is not necessarily the same thing as the natural 
number 0. One is a set; the other is a number. However, it is true that the cardinality 
of the empty set is 0; see Sect. 3.6. 
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If Axiom 3.3 was the only axiom that set theory had, then set theory could be 
quite boring, as there might be just a single set in existence, the empty set. We now 
present further axioms to enrich the class of sets available. 


Axiom 3.4 (Singleton sets and pair sets). If a is an object, then there exists a set 
{a} whose only element is a, i.e., for every object y, we have y € {a} if and only if 
y =a; we tefer to {a} as the singleton set whose element is a. Furthermore, if a and 
b are objects, then there exists a set {a, b} whose only elements are a and b; i.e., for 
every object y, we have y € {a, b} if and only if y = a or y = b; we refer to this set 
as the pair set formed by a and b. 


Remarks 3.1.8 Just as there is only one empty set, there is only one singleton set 
for each object a, thanks to Axiom 3.2 (why?). Similarly, given any two objects a 
and b, there is only one pair set formed by a and b. Also, Axiom 3.2 also ensures 
that {a, b} = {b, a} (why?) and {a, a} = {a} (why?). Thus the singleton set axiom 
is in fact redundant, being a consequence of the pair set axiom. Conversely, the pair 
set axiom will follow from the singleton set axiom and the pairwise union axiom 
below (see Lemma 3.1.12). One may wonder why we don’t go further and create 
triplet axioms, quadruplet axioms, etc.; however there will be no need for this once 
we introduce the pairwise union axiom below. 


Examples 3.1.9 Since @ is a set (and hence an object), the singleton set {9}, i.e., the 
set whose only element is J, is also a set. Similarly, the singleton set {{@}} and the 
pair set {Z, {@}} are also sets. These four sets are not equal to each other (Exercise 
3.1.2). 


As the above examples show, we can now create quite a few sets; however, the 
sets we make are still fairly small (each set that we can build consists of no more 
than two elements, so far). The next axiom allows us to build somewhat larger sets 
than before. 


Axiom 3.5 (Pairwise union). Given any two sets A, B, there exists a set AU B, 
called the union of A and B, which consists of all the elements which belong to A 
or B or both. In other words, for any object x, 


xE€AUB =} weEAorxe B). 


Recall that “or” refers by default in mathematics to inclusive or: “X or Y is true” 
means that “either X is true, or Y is true, or both are true’. See Sect. A.1. 


Example 3.1.10 The set {1,2} U {2, 3} consists of those elements which either lie 
on {1, 2} or in {2, 3} or in both, or in other words the elements of this set are simply 
1, 2, and 3. Because of this, we denote this set as {1, 2} U {2, 3} = {1, 2, 3}. 


Remark 3.1.11 If A, B, A’ are sets, and A is equal to A’, then A U B is equal to 
A’ U B (why? One needs to use Axiom 3.5 and Axiom 3.2). Similarly if B’ is a set 
which is equal to B, then A U B is equal to A U B’. Thus the operation of union 
obeys the axiom of substitution and is thus well-defined on sets. 


3.1 Fundamentals 31 


We now give some basic properties of unions. 


Lemma 3.1.12 [fa and b are objects, then {a, b} = {a} U {b}. If A, B, C are sets, 
then the union operation is commutative (i.e., AU B = B U A) and associative (i.e., 
(AUB)UC=AU(BUC)). Also, we have AUA=AUB=BHUA= A. 


Proof We prove just the associativity identity (AU B) UC = AU (BUC) and 
leave the remaining claims to Exercise 3.1.3. By Axiom 3.2, we need to show that 
every element x of (AU B) UC is an element of AU (BUC), and vice versa. 
So suppose first that x is an element of (A U B) UC. By Axiom 3.5, this means 
that at least one of x € AU B or x € C is true. We now divide into two cases. If 
x € C, then by Axiom 3.5 again x € BUC, and so by Axiom 3.5 again we have 
x € AU(BUC). Now suppose instead x € A U B, then by Axiom 3.5 againx € A 
orx € B.Ifx € Athenx € AU (BUC) by Axiom 3.5, while if x € B then by con- 
secutive applications of Axiom 3.5 we have x € BUC andhencex € AU(BUC). 
Thus in all cases we see that every element of (A U B) UC liesin AU(BUC).A 
similar argument shows that every element of A U (B UC) lies in (A U B) U C, and 
so(AU B)UC=AU(BUC)as desired. 


Because of the above lemma, we do not need to use parentheses to denote multiple 
unions, thus for instance we can write A U B UC instead of (A U B) UC or AU 
(B UC). Similarly for unions of four sets, AU BUC U D, etc. 


Remark 3.1.13 While the operation of union has some similarities with addition, 
the two operations are not identical. For instance, {2} U {3} = {2, 3} and2+3 =5, 
whereas {2} + {3} is meaningless (addition pertains to numbers, not sets) and 2 U 3 
is also meaningless (union pertains to sets, not numbers). 


This axiom allows us to define triplet sets, quadruplet sets, and so forth: if a, b, c 
are three objects, we define {a, b, c} := {a} U {b} U {c}; ifa, b, c, d are four objects, 
then we define {a, b, c, d} := {a} U {b} U {c} U {d}, and so forth. On the other hand, 
we are not yet in a position to define sets consisting of n objects for any given natural 
number n; this would require iterating the above construction “n times”, but the 
concept of n-fold iteration has not yet been rigorously defined. For similar reasons, 
we cannot yet define sets consisting of infinitely many objects, because that would 
require iterating the axiom of pairwise union infinitely often, and it is not clear at 
this stage that one can do this rigorously. Later on, we will introduce other axioms 
of set theory which allow one to construct arbitrarily large, and even infinite, sets. 

Clearly, some sets seem to be larger than others. One way to formalize this concept 
is through the notion of a subset. 


Definition 3.1.14 (Subsets). Let A, B be sets. We say that A is a subset of B, denoted 
A C B, iff every element of A is also an element of B, i.e., 


Forany objectx, xe A => xeB. 


We say that A is a proper subset of B, denoted A C B,if A C BandAFB. 
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Remark 3.1.15 Because these definitions involve only the notions of equality and 
the “is an element of” relation, both of which already obey the axiom of substitution, 
the notion of subset also automatically obeys the axiom of substitution. Thus for 
instance if A C B and A = A’, then A’ C B. 


Examples 3.1.16 We have {1, 2, 4} € {1, 2, 3, 4, 5}, because every element of {1, 2, 4} 
is also an element of {1, 2, 3, 4, 5}. In fact we also have {1, 2, 4} € {1, 2, 3, 4, 5}, 
since the two sets {1, 2,4} and {1, 2,3, 4,5} are not equal. Given any set A, we 
always have A C A (why?) and @ C A (why?). 


The notion of subset in set theory is similar to the notion of “less than or equal to” 
for numbers, as the following proposition demonstrates (for a more precise statement, 
see Definition 8.5.1): 


Proposition 3.1.17 (Sets are partially ordered by set inclusion). Let A, B,C be 
sets. If A C Band B CC thenA CC. IfA C Band B CA, then A = B. Finally, 
ifA GC BandB CC thenA CC. 


Proof We shall just prove the first claim. Suppose that A C B and B C C. To prove 
that A C C, we have to prove that every element of A is an element of C. So, let us 
pick an arbitrary element x of A. Then, since A C B, x must then be an element of 
B. But then since B C C, x is an element of C. Thus every element of A is indeed 
an element of C, as claimed. 


Remark 3.1.18 The subset relation and the union operation are related to each other: 
see for instance Exercise 3.1.7. 


Remark 3.1.19 There is one important difference between the subset relation C and 
the less than relation <. Given any two distinct natural numbers 1, m, we know that 
one of them is smaller than the other (Proposition 2.2.13); however, given two distinct 
sets, itis not in general true that one of them is a subset of the other. For instance, take 
A := {2n :n € N} tobe the set of even natural numbers, and B := {2n + 1:n EN} 
to be the set of odd natural numbers. Then neither set is a subset of the other. This 
is why we say that sets are only partially ordered, whereas the natural numbers are 
totally ordered (see Definitions 8.5.1, 8.5.3). 


Remark 3.1.20 We should also caution that the subset relation C is not the same 
as the element relation €. The number 2 is an element of {1, 2, 3} but not a subset; 
thus 2 € {1, 2, 3}, but 2 g {1, 2, 3}. Indeed, 2 is not even a set. Conversely, while {2} 
is a subset of {1, 2, 3}, it is not an element; thus {2} C {1, 2, 3} but {2} ¢ {1, 2, 3}. 
The point is that the number 2 and the set {2} are distinct objects. It is important 
to distinguish sets from their elements, as they can have different properties. For 
instance, it is possible to have an infinite set consisting of finite numbers (the set N 
of natural numbers is one such example), and it is also possible to have a finite set 
consisting of infinite objects (consider for instance the finite set {N, Z, Q, R}, which 
has four elements, all of which are infinite). 


We now give an axiom which easily allows us to create subsets out of larger sets. 
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Axiom 3.6 (Axiom of specification). Let A be a set, and for each x € A, let P(x) 
be a property pertaining to x (i.e., for each x € A, P(x) is either a true statement or 
a false statement). Then there exists a set, called {x € A : P(x) is true} (or simply 
{x € A: P(x)} for short), whose elements are precisely the elements x in A for 
which P(x) is true. In other words, for any object y, 


y € {x € A: P(x) is true} <= > (y € A and P(y) is true). 


This axiom is also known as the axiom of separation. Note that {x € A: 
P(x) is true} is always a subset of A (why?), though it could be as large as A or 
as small as the empty set. One can verify that the axiom of substitution works for 
specification, thus if A = A’ then {x € A: P(x)} = {x € A’: P(x)} (why?). 


Example 3.1.21 Let S := {1, 2,3, 4,5}. Then the set {n € S:n < 4} is the set of 
those elements n in S for whichn < 41s true, i.e., {n € S:n < 4} = {1, 2, 3}. Sim- 
ilarly, the set {n € S:n < 7} is the same as S itself, while {n € S:n < 1} is the 
empty set. 


We sometimes write {x € A| P(x)} instead of {x € A : P(x)}; this is useful when 
we are using the colon “:” to denote something else, for instance to denote the domain 
and codomain of a function f: X — Y. We can also describe {x € A: P(x)} in 
words as “the set of all x in A such that P(x) is true”. 

We can use this axiom of specification to define some further operations on sets, 


namely intersections and difference sets. 


Definition 3.1.22 (Intersections). The intersection S; M S of two sets is defined to 
be the set 
SO So := {x € S; 2x € $5}. 


In other words, S; M S2 consists of all the elements which belong to both S; and S). 
Thus, for all objects x, 


xES,;NS. — > x € S; andx € 8. 


Remark 3.1.23 Note that this definition is well-defined (1.e., it obeys the axiom of 
substitution, see Sect. A.7) because it is defined in terms of more primitive operations 
which were already known to obey the axiom of substitution. Similar remarks apply 
to future definitions in this chapter and will usually not be mentioned explicitly again. 


Examples 3.1.24 We have {1, 2, 4} 9 {2, 3, 4} = {2, 4}, (1, 2} 9 (3, 4} =, {2, 3} U 
J = {2, 3}, and {2,3} ND =D. 


Remark 3.1.25 By the way, one should be careful with the English word “and”: 
rather confusingly, it can mean either union or intersection, depending on context. 
For instance, if one talks about a set of “boys and girls”, one means the union of a 
set of boys with a set of girls, but if one talks about the set of people who are single 
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and male, then one means the intersection of the set of single people with the set of 
male people. (Can you work out the rule of grammar that determines when “and” 
means union and when “and” means intersection?) Another problem is that “‘and” is 
also used in English to denote addition, thus for instance one could say that “2 and 
3 is 5’, while also saying that “the elements of {2} and the elements of {3} form the 
set {2, 3}” and “the elements in {2} and {3} form the set @”. This can certainly get 
confusing! One reason we resort to mathematical symbols instead of English words 
such as “and” is that mathematical symbols always have a precise and unambiguous 
meaning, whereas one must often look very carefully at the context in order to work 
out what an English word means. 


Two sets A, B are said to be disjoint if AN B = @. Note that this is not the same 
concept as being distinct, A ~ B. For instance, the sets {1, 2, 3} and {2, 3, 4} are 
distinct (there are elements of one set which are not elements of the other) but not 
disjoint (because their intersection is non-empty). Meanwhile, the sets @ and % are 
disjoint but not distinct (why?). 

There is an operation on sets that is somewhat analogous to subtraction: 


Definition 3.1.26 (Difference sets). Given two sets A and B, we define the set A — B 
or A\B to be the set A with any elements of B removed: 


A\B :={x €A:x ¢ B}; 


for instance, {1, 2, 3, 4}\{2, 4, 6} = {1, 3}. In many cases B will be a subset of A, 
but not necessarily. 


We now give some basic properties of unions, intersections, and difference sets. 


Proposition 3.1.27 (Sets form a boolean algebra). Let A, B, C be sets, and let X be 
a set containing A, B, C as subsets. 


(a) (Minimal element) We have AU®= AandAND=¥%. 

(b) (Maximal element) We have AU X = X and ANX =A. 

(c) (Identity) We have AN A= AandAUA=<A. 

(d) (Commutativity) We have AUB= BUAandAN B= BOA. 

(e) (Associativity) We have (AU B) UC =AU(BUC) and (AN B)NC=AN 
(BNC). 

(f) (Distributivity) Wehave AN (BUC) = (AN B)U(AN C)andAU (BNC) = 
(AU B)N (AUC). 

(g) (Partition) We have AU (X\A) = X and AN (X\A) = &. 

(h) (De Morgan laws) We have X\(A U B) = (X\A)N (X\B) and X\(AN B) = 
(X\A) U (X\B). 


wa 


Remark 3.1.28 The de Morgan laws are named after the logician Augustus De 
Morgan (1806-1871), who identified them as one of the basic laws of set theory. 


Proof See Exercise 3.1.6. 
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Remark 3.1.29 The reader may observe a certain symmetry in the above laws 
between U and, and between X and J. This is an example of duality—two distinct 
properties or objects being dual to each other. In this case, the duality is manifested 
by the complementation relation A +» X\A; the de Morgan laws assert that this 
relation converts unions into intersections and vice versa. (It also interchanges X 
and the empty set.) The above laws are collectively known as the laws of Boolean 
algebra, after the mathematician George Boole (1815-1864), and are also applicable 
to a number of other objects other than sets; they play a particularly important réle 
in mathematical logic. 


We have now accumulated a number of axioms and results about sets, but there 
are still many things we are not able to do yet. One of the basic things we wish to do 
with a set is take each of the objects of that set, and somehow transform each such 
object into a new object; for instance we may wish to start with a set of numbers, say 
{3, 5, 9}, and increment each one, creating a new set {4, 6, 10}. This is not something 
we can do directly using only the axioms we already have, so we need a new axiom: 


Axiom 3.7 (Replacement). Let A be a set. For any object x € A, and any object 
y, suppose we have a statement P(x, y) pertaining to x and y, such that for each 
x € A there is at most one y for which P(x, y) is true. Then there exists a set 
{y : P(x, y) is true for some x € A}, such that for any object z, 


z€{y: P(x, y) is true for some x € A} 
<=> P(x, z) is true for some x € A. 


Example 3.1.30 Let A := {3,5, 9}, and let P(x, y) be the statement y = x+4, ie., 
y is the successor of x. Observe that for every x € A, there is exactly one y for which 
P(x, y) is true—specifically, the successor of x. Thus the above axiom asserts that 
the set {y : y = x++ for some x € {3, 5, 9}} exists; in this case, itis clearly the same 
set as {4, 6, 10} (why?). 


Example 3.1.31 Let A = {3,5, 9}, and let P(x, y) be the statement y = 1. Then 
again for every x € A, there is exactly one y for which P(x, y) is true—specifically, 
the number 1. In this case {y : y = 1 for some x € {3, 5, 9}} is just the singleton set 
{1}; we have replaced each element 3, 5, 9 of the original set A by the same object, 
namely 1. Thus this rather silly example shows that the set obtained by the above 
axiom can be “smaller” than the original set. 


We often abbreviate a set of the form 


{y: y= f(x) for some x € A} 


as { f(x): x € A} or {f(x)|x € A}. Thus for instance, if A = {3, 5, 9}, then {x++ : 
x € A} is the set {4, 6, 10}. We can of course combine the axiom of replacement 
with the axiom of specification, thus for instance we can create sets such as { f(x) : 


36 3 Set Theory 


x € A; P(x) is true} by starting with the set A, using the axiom of specification to 
create the set {x € A: P(x) is true}, and then applying the axiom of replacement 
to create { f(x) : x € A; P(x) is true}. Thus for instance {n++ :n € {3,5, 9}; < 
6} = {4, 6}. 

In many of our examples we have implicitly assumed that natural numbers are in 
fact objects. Let us formalize this as follows. 


Axiom 3.8 (Infinity). There exists a set N, whose elements are called natural num- 
bers, as well as an object 0 in N, and an object n+++ assigned to every natural number 
n EN, such that the Peano axioms (Axioms 2.1—2.5) hold. 


This is the more formal version of Assumption 2.6. It is called the axiom of infinity 
because it introduces the most basic example of an infinite set, namely the set of 
natural numbers N. (We will formalize what finite and infinite mean in Sect. 3.6.) 
From the axiom of infinity we see that numbers such as 3, 5, and 7 are indeed objects 
in set theory, and so (from the pair set axiom and pairwise union axiom) we can 
indeed legitimately construct sets such as {3,5,9} as we have been doing in our 
examples. 

One has to keep the concept of a set distinct from the elements of that set; for 
instance, the set {n +3: €N,0 <n <5} is not the same thing as the expression 
or function n + 3. We emphasize this with an example: 


Example 3.1.32 (Informal) This example requires the notion of subtraction, which 
has not yet been formally introduced. The following two sets are equal, 


{n+3:nEN,O0O<n <5} ={8-—n:nEN,0<n <5}, (3.1) 


(see below), even though the expressions n + 3 and 8 — n are never equal to each 
other for any natural number n. Thus, it is a good idea to remember to use those 
curly braces {} when you talk about sets, lest you accidentally confuse a set with its 
elements. One reason for this counterintuitive situation is that the letter n is being 
used in two different ways on the two sides of (3.1). To clarify the situation, let 
us rewrite the set {8 —2:n €N,0 <n <5} by replacing the letter n by the letter 
m, thus giving {8 —-m:m <¢N,0 < m <5}. This is exactly the same set as before 
(why?), so we can rewrite (3.1) as 


{n+3:nEN,0<n <5} ={8-—m:meEN,O0<m <5}. 


Now it is easy to see (using Axiom 3.2) why this identity is true: every number of the 
form n + 3, where n is a natural number between 0 and 5, is also of the form 8 — m 
where m := 5 — n (note that m is therefore also a natural number between 0 and 5); 
conversely, every number of the form 8 — m, where m is a natural number between 0 
and 5, is also of the form n + 3, where n := 5 — m (note that n is therefore a natural 
number between 0 and 5). Observe how much more confusing the above explanation 
of (3.1) would have been if we had not changed one of the n’s to an m first! 
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Formally, we can refer to N as “the set of natural numbers”, but we shall often 
abbreviate this to simply “the natural numbers”. Similarly for some other sets that 
we will introduce later in this text; for instance Z will be “the set of integers” but 
also the “integers”, R will be the “set of real numbers” but also “the real numbers” 
or even just “the reals”, and so forth. 


— Exercises — 


Exercise 3.1.1 Leta, b,c, d be objects such that {a, b} = {c, d}. Show that at least one of the two 
statements “a = c and b = d” and “a = d and b = c” hold. 


Exercise 3.1.2 Using only Axiom 3.2, Axiom 3.1, Axiom 3.3, and Axiom 3.4, prove that the sets 
0, {DO}, {{}}, and {, {H}} are all distinct (i.e., no two of them are equal to each other). 


Exercise 3.1.3 Prove the remaining claims in Lemma 3.1.12. 
Exercise 3.1.4. Prove the remaining claims in Proposition 3.1.17. 


Exercise 3.1.5 Let A, B be sets. Show that the three statements AC B, AUB=B,ANB=A 
are logically equivalent (any one of them implies the other two). 


Exercise 3.1.6 Prove Proposition 3.1.27. (Hint: one can use some of these claims to prove others. 
Some of the claims have also appeared previously in Lemma 3.1.12.) 


Exercise 3.1.7 Let A, B,C be sets. Show that AM B C A and AN B C B. Furthermore, show 
that C C A and C C B if and only if CC AM B. Ina similar spirit, show that A C AU B and 
BCAUB, and furthermore that A C C and B C C ifandonly if AUBCC. 


Exercise 3.1.8 Let A, B be sets. Prove the absorption laws AN (AU B) = Aand AU (AN B) = 
A; 


Exercise 3.1.9 Let A, B, X be sets such that AU B = X and AN B = @. Show that A = X\B 
and B = X\A. 


Exercise 3.1.10 Let A and B be sets. Show that the three sets A\B, AM B, and B\A are disjoint, 
and that their union is A U B. 


Exercise 3.1.11 Show that the axiom of replacement implies the axiom of specification. 


Exercise 3.1.12 Suppose that A, B, A’, B’ are sets such that A’ C A and B’ C B. 


(i) Show that A’U B’C AU Band A’N B’ CAN B. 

(ii) Give a counterexample to show that the statement A’\B’ C A\B is false. Can you find a 
modification of this statement involving the set difference operation \ which is true given the 
stated hypotheses? Justify your answer. 


Exercise 3.1.13 Euclid famously defined a point to be “that which has no part”. This exercise 
should be reminiscent of that definition. Define a proper subset of a set A to be a subset B of A 
with B # A. Let A be a non-empty set. Show that A does not have any non-empty proper subsets 
if and only if A is of the form A = {x} for some object x. 
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3.2 Russell’s Paradox (Optional) 


Many of the axioms introduced in the previous section have a similar flavor: they 
allow us to form a set consisting of all the elements which have a certain property. 
These axioms are plausible, but one might think that they could be unified, for 
instance by introducing the following axiom: 


Axiom 3.9 (Universal specification). (Dangerous!) Suppose for every object x we 
have a property P(x) pertaining to x (so that for every x, P(x) is either a true 
statement or a false statement). Then there exists a set {x : P(x) is true} such that 
for every object y, 


y € {x : P(x) is true} << > P(y) is true. 


This axiom is also known as the axiom of comprehension. It asserts that every 
property corresponds to a set; if we assumed that axiom, we could talk about the set 
of all blue objects, the set of all natural numbers, the set of all sets, and so forth. 
This axiom also implies most of the axioms in the previous section (Exercise 3.2.1). 
Unfortunately, this axiom cannot be introduced into set theory, because it creates a 
logical contradiction known as Russell’s paradox, discovered by the philosopher and 
logician Bertrand Russell (1872-1970) in 1901. The paradox runs as follows. Let 
P(x) be the statement 


P(x) = “x isaset, andx ¢ x”; 


i.e., P(x) is true only when x is a set which does not contain itself. For instance, 
P({2, 3, 4}) is true, since the set {2, 3, 4} is not one of the three elements 2, 3, 4 of 
{2, 3, 4}. On the other hand, if we let S be the set of all sets (which we would know 
to exist from the axiom of universal specification), then since S is itself a set, it is an 
element of S, and so P(S) is false. Now use the axiom of universal specification to 
create the set 


Q := {x : P(x) is true} = {x : x isaset andx ¢ x}, 


i.e., the set of all sets which do not contain themselves. Now ask the question: does 
Q contain itself, i.e. is Q € Q? If Q did contain itself, then by the definition of Q 
this means that P(Q2) is true, i.e., Q is a set and Q ¢ Q. On the other hand, if Q did 
not contain itself, then by the definition of P P((2) would be true, and hence by the 
definition of Q we have Q € &. Thus in either case we have both Q € Qand Q ¢ Q, 
which is absurd. 
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The problem with the above axiom is that it creates sets which are far too “large” — 
for instance, we can use that axiom to talk about the set of all objects (a so-called 
universal set). Since sets are themselves objects (Axiom 3.1), this means that sets are 
allowed to contain themselves, which is a somewhat silly state of affairs. One way to 
informally resolve this issue is to think of objects as being arranged in a hierarchy. 
At the bottom of the hierarchy are the primitive objects—the objects that are not 
sets,! such as the natural number 37. Then on the next rung of the hierarchy there are 
sets whose elements consist only of primitive objects, such as {3, 4, 7} or the empty 
set J; let’s call these “primitive sets” for now. Then there are sets whose elements 
consist only of primitive objects and primitive sets, such as {3, 4, 7, {3, 4, 7}}. Then 
we can form sets out of these objects, and so forth. The point is that at each stage of 
the hierarchy we only see sets whose elements consist of objects at lower stages of 
the hierarchy, and so at no stage do we ever construct a set which contains itself. 

To actually formalize the above intuition of a hierarchy of objects is actually rather 
complicated, and we will not do so here. Instead, we shall simply postulate an axiom 
which ensures that absurdities such as Russell’s paradox do not occur. 


Axiom 3.10 (Regularity). If A is a non-empty set, then there is at least one element 
x of A which is either not a set, or is disjoint from A. 


The point of this axiom (which is also known as the axiom of foundation) is 
that it is asserting that at least one of the elements of A is so low on the hierarchy 
of objects that it does not contain any of the other elements of A. For instance, if 
A = {{3, 4}, {3, 4, {3, 4}}}, then the element {3, 4} € A does not contain any of the 
elements of A (neither 3 nor 4 lies in A), although the element {3, 4, {3, 4}}, being 
somewhat higher in the hierarchy, does contain an element of A, namely {3, 4}. One 
particular consequence of this axiom is that sets are no longer allowed to contain 
themselves (Exercise 3.2.2). 

One can legitimately ask whether we really need this axiom in our set theory, 
as it is certainly less intuitive than our other axioms. For the purposes of doing 
analysis, it turns out in fact that this axiom is never needed; all the sets we consider 
in analysis are typically very low on the hierarchy of objects, for instance being sets 
of primitive objects, or sets of sets of primitive objects, or at worst sets of sets of 
sets of primitive objects. However it is necessary to include this axiom in order to 
perform more advanced set theory, and so we have included this axiom in the text 
(but in an optional section) for sake of completeness. 


' In pure set theory, there will be no primitive objects, but there will be one primitive set # on the 
next rung of the hierarchy. 
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— Exercises — 


Exercise 3.2.1 Show that the universal specification axiom, Axiom 3.9, if assumed to be true, 
would imply Axioms 3.3, 3.4, 3.5, 3.6, and 3.7. (If we assume that all natural numbers are objects, 
we also obtain Axiom 3.8.) Thus, this axiom, if permitted, would simplify the foundations of set 
theory tremendously (and can be viewed as one basis for an intuitive model of set theory known as 
“naive set theory”). Unfortunately, as we have seen, Axiom 3.9 is “too good to be true”! 


Exercise 3.2.2 Use the axiom of regularity (and the singleton set axiom) to show that if A is a set, 
then A ¢ A. Furthermore, show that if A and B are two sets, then either A ¢ B or B ¢ A (or both). 
(One corollary of this exercise is worth noting: given any set A, there exists a mathematical object 
that is not an element in A, namely A itself. Thus one can always “add one more element” to a set 
A to create a larger set, namely A U {A}.) 


Exercise 3.2.3 Show (assuming the other axioms of set theory) that the universal specification 
axiom, Axiom 3.9, is equivalent to an axiom postulating the existence of a “universal set” Q 
consisting of all objects (i.e., for all objects x, we have x € Q2). In other words, if Axiom 3.9 is true, 
then a universal set exists, and conversely, if a universal set exists, then Axiom 3.9 is true. (This 
helps explain why Axiom 3.9 is called the axiom of universal specification.) Note that if a universal 
set Q existed, then we would have Q € Q by Axiom 3.1, contradicting Exercise 3.2.2. Thus the 
axiom of foundation specifically rules out the axiom of universal specification. 


3.3 Functions 


In order to do analysis, it is not particularly useful to just have the notion of a set; 
we also need the notion of a function from one set to another. Informally, a function 
jf: X — Y from one set X to another set Y is an operation which assigns to each 
element (or “input”) x in X, a single element (or “output”) f(x) in Y; we have 
already used this informal concept in the previous chapter when we discussed the 
natural numbers. The formal definition is as follows. 


Definition 3.3.1 (Functions) Let X, Y be sets, and let P(x, y) be a property per- 
taining to an object x € X and an object y € Y, such that for every x € X, there is 
exactly one y € Y for which P(x, y) is true (this is sometimes known as the verti- 
cal line test). Then we define the function f: X — Y defined by P on the domain 
X and codomain’ to be the object which, given any input x € X, assigns an output 
f(x) € Y, defined to be the unique object f(x) € Y for which P(x, f(x)) is true. 
Thus, for any x € X andy € Y, 


y= f(x) = P(x, y) is true. 


Functions are also referred to as maps or transformations, depending on the con- 
text. They are also sometimes called morphisms, although to be more precise, a 
morphism refers to a more general class of object, which may or may not correspond 
to actual functions, depending on the context. 


? In some texts the codomain is referred to as the range; however we will use the term range to refer 
instead to the image f(X) of the domain, defined after Definition 3.4.1. 
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Remark 3.3.2. Implicit in the above definition is an assumption that whenever one 
is given two sets X, Y and a property P obeying the vertical line test, one can form a 
function object f. Strictly speaking, the assumption of the existence of sucha function 
object f should be stated as an explicit axiom. However, we will not do so here, as 
it turns out to be redundant. (More precisely, in view of Exercise 3.5.10, it is always 
possible to encode a function f as an ordered triple (X, Y, {(x, f(x)) : x € X}) 
consisting of the domain, codomain, and graph of the function, which gives a way 
to build functions as objects using the operations provided by the preceding axioms 
of set theory.) Also implicit in the above definition is the understanding that every 
function f is automatically associated with a domain X, a codomain Y, and a defining 
property P. 


Example 3.3.3 Let X =N, Y =N, and let P(x, y) be the property that y = x+4+. 
Then for each x € N there is exactly one y € N for which P(x, y) is true, namely 
y = x-+++. Thus we can define a function f: N — N associated to this property, so 
that f(x) = x-++ for all x; this is the increment function on N, which takes a natural 
number as input and returns its increment as output. Thus for instance f(4) = 5, 
f (Qn + 3) = 2n + 4and so forth. One might also hope to define a decrement function 
g: N > Nassociated to the property P(x, y) defined by y++ = x, ie., g(x) would 
be the number whose increment is x. Unfortunately this does not define a function, 
because when x = 0 there is no natural number y whose increment is equal to x 
(Axiom 2.3). On the other hand, we can legitimately define a decrement function 
h: N\{O} — N associated to the property P(x, y) defined by y++ = x, because 
when x € N\{0} there is indeed exactly one natural number y such that y++ = x, 
thanks to Lemma 2.2.10. Thus for instance h(4) = 3 and h(2n + 3) = 2n + 2, but 
h(O) is undefined since 0 is not in the domain N\{O}. 


Example 3.3.4 (Informal) This example requires the real numbers R, which we 
will define in Chap. 5. One could try to define a square root function /: RR 
by associating it to the property P(x, y) defined by y* = x, i.e., we would want ./x 
to be the number y such that y? = x. Unfortunately there are two problems which 
prohibit this definition from actually creating a function. The first is that there exist 
real numbers x for which P (x, y) is never true, for instance if x = —1 then there is no 
real number y such that y* = x. This problem however can be solved by restricting 
the domain from R to the right half-line [0, +-oo). The second problem is that even 
when x € [0, +00), it is possible for there to be more than one y in the codomain 
R for which y* = x, for instance if x = 4 then both y = 2 and y = —2 obey the 
property P(x, y), i.e., both +2 and —2 are square roots of 4. This problem can 
however be solved by restricting the codomain of R to [0, +00). Once one does this, 
then one can correctly define a square root function Pie [0, +oo) — [0, +00) using 
the relation y? = x; thus ./x is the unique number y € [0, +00) such that y? = x. 


One common way to define a function is simply to specify its domain, its 
codomain, and how one generates the output f(x) from each input; this is known 
as an explicit definition of a function. For instance, the function f in Example 3.3.3 
could be defined explicitly by saying that f has domain and codomain equal to N, 
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and f(x) := x-++ for all x € N. In other cases we only define a function f by speci- 
fying what property P(x, y) links the input x with the output f(x); this is an implicit 
definition of a function. For instance, the square root function ./x in Example 3.3.4 
was defined implicitly by the relation (./x)? = x. Note that an implicit definition is 
only valid if we know that for every input there is exactly one output which obeys 
the implicit relation. In many cases we omit specifying the domain and codomain 
of a function for brevity, and thus for instance we could refer to the function f 
in Example 3.3.3 as “the function f(x) := x-+++”, “the function x F x-+++’”, “the 
function x++’’, or even the extremely abbreviated “++-”. However, too much of this 
abbreviation can be dangerous; sometimes it is important to know what the domain 
and codomain of the function is. 

We observe that functions obey the axiom of substitution: if x = x’, then f(x) = 
f(x’) (why?). In other words, equal inputs imply equal outputs. On the other hand, 
unequal inputs do not necessarily ensure unequal outputs, as the following example 
shows: 


Example 3.3.5 Let X = N, Y =N, and let P(x, y) be the property that y = 7. Then 
certainly for every x € N there is exactly one y for which P(x, y) is true, namely 
the number 7. Thus we can create a function f : N — N associated to this property; 
it is simply the constant function which assigns the output of f(x) = 7 to each input 
x EN. Thus it is certainly possible for different inputs to generate the same output. 


Remark 3.3.6 We are now using parentheses () to denote several different things in 
mathematics; on one hand, we are using them to clarify the order of operations (com- 
pare for instance 2 + (3 x 4) = 14 with (2+ 3) x 4 = 20), but on the other hand 
we also use parentheses to enclose the argument x of a function f(x) or of a prop- 
erty such as P(x). However, the two usages of parentheses usually are unambiguous 
from context. For instance, if a is a number, then a(b + c) denotes the expression 
a x (b+), whereas if f is a function, then f(b + c) denotes the output of f when 
the input is b + c. Sometimes the argument of a function is denoted by subscripting 
instead of parentheses; for instance, a sequence of natural numbers ao, a1, a2, 43, ... 
is, strictly speaking, a function from N to N, but is denoted by n +> a,, rather than 
nt a(n). 


Remark 3.3.7 We do not necessarily require functions to be sets, nor do we require 
sets to be functions. Thus, it does not necessarily make sense to ask whether an object 
x is an element of a function f, and it does not necessarily make sense to apply a 
set A to an input x to create an output A(x). On the other hand, it is permissible to 
start with a function f: X — Y and construct its graph {(x, f(x)) : x € X}, which 
describes the function completely once the domain X and codomain Y are specified: 
see Sect. 3.5. 


We now define some basic concepts and notions for functions. The first notion is 
that of equality. 


Definition 3.3.8 (Equality of functions). Two functions f: X > Y, g: X’—> Y’ 
are said to be equal if their domains and codomains agree (i.e., X = X’ and Y = Y’), 
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and furthermore that f(x) = g(x) for all x € X. If f(x) and g(x) agree for some 
values of x in the domain, but not others, then we do not consider f and g to be 
equal.’ If two functions f, g have different domains, or different ranges, we also do 
not consider them to be equal. 


Remark 3.3.9 According to this definition, two functions that have different domains 
or different codomains are, strictly speaking, distinct functions. However, when it is 
safe to do so without causing confusion, it is sometimes useful to “abuse notation” by 
identifying together functions of different domains or codomains if their values agree 
ontheir common domain of definition; this is analogous to the practice of “overload- 
ing” an operator in software engineering. See the discussion after Definition 9.4.1 
for one instance of this. 


Example 3.3.10 (Informal) The functions x +> x? + 2x + landxt> (x + 1)’ are 
equal on the domain R. The functions x b> x and x b> |x| are equal on the positive 
real axis, but are not equal on R; thus the concept of equality of functions can depend 
on the choice of domain. 


Example 3.3.11 A rather boring example of a function is the empty function f : 06 > 
X from the empty set to a given set X. Since the empty set has no elements, we do 
not need to specify what f does to any input. Nevertheless, just as the empty set is 
a set, the empty function is a function, albeit not a particularly interesting one. Note 
that for each set X, there is only one function from % to X, since Definition 3.3.8 
asserts that all functions from J to X are equal (why?). 


Remark 3.3.12 It is not immediately apparent that Definition 3.3.8 is compatible 
with the axioms of equality in Appendix A.7, although Exercise 3.3.1 provides evi- 
dence toward this compatibility. There are at least three ways to address this issue. 
One is to regard Definition 3.3.8 as an axiom about equality of functions rather than a 
definition. Another is to provide a more explicit definition of a function in which Def- 
inition 3.3.8 becomes a theorem; for instance, one can define a function f: X > Y 
to be an ordered triple (X, Y, G) consisting of a domain set X, a codomain set Y, 
and a graph G = {(x, f(x)) : x € X} that obeys the vertical line test and use this 
latter graph to define the value of f(x) € Y for each element x of the domain; see 
Exercise 3.5.10. A third way is to start with a mathematical universe 7/ without any 
functions in it and use Definition 3.3.8 to create a larger extension of this universe 
that contains function objects that behave as specified as in Definition 3.3.8. This 
final procedure however requires a bit more of the formalism of logic and model 
theory than is provided by this text, and so will not be detailed here. 


A fundamental operation available for functions is composition. 


Definition 3.3.13 (Composition). Let f: X — Y and g: Y — Z be two functions, 
such that the codomain of f is the same set as the domain of g. We then define the 


3 In Chap. 8 of Analysis II, we shall introduce a weaker notion of equality, that of two functions 
being equal almost everywhere. 
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composition go f : X — Z of the two functions g and f to be the function defined 
explicitly by the formula 
(g 0 f)(x) = g(f()). 


If the codomain of f does not match the domain of g, we leave the composition 
go f undefined. 


It is easy to check that composition obeys the axiom of substitution (Exercise 
3.3.1). 


Example 3.3.14 Let f: N — N be the function f (1) := 2n, and let g: N > N be 
the function g(n) := n+ 3. Then g o f is the function 


go f(n) = g(f(m)) = gn) = 2n + 3, 


thus for instance g o f(1) = 5, go f(2) = 7, and so forth. Meanwhile, f o g is the 
function 


fog(n) = f(gn)) = f(n+3) = 2(n+ 3) = 2n4+ 6, 


thus for instance f o g(1) = 8, f o g(2) = 10, and so forth. 


The above example shows that composition is not commutative: f o g and go f 
are not necessarily the same function. However, composition is still associative: 


Lemma 3.3.15 (Composition is associative). Let f: Z— W, g: Y > Z, and 
h: X — Y be functions. Then f 0(goh)=(fog)oh. 


Proof Since g oh isa function from X to Z, f o (g oh) is a function from X to W. 
Similarly f o g is a function from Y to W, and hence (f o g) oh is a function from 
X to W. Thus f o(g oh) and (f o g) oh have the same domain and codomain. In 
order to check that they are equal, we see from Definition 3.3.8 that we have to verify 
that (f o (g oh))(x) = ((f 0 g) oh) (x) for all x € X. But by Definition 3.3.13 


(f 0 (g oh))(x) = f(g 0 h)(x)) 
= f(g@)) 
= (f 0 g)(h(x)) 
= (fog) oh)@) 


as desired. 


Remark 3.3.16 Note that while g appears to the left of f in the expression go f, 
the function g o f applies the right-most function f first, before applying g. This is 
often confusing at first; it arises because we traditionally place a function f to the 
left of its input x rather than to the right. (There are some alternate mathematical 
notations in which the function is placed to the right of the input; thus we would 
write xf instead of f(x), but this notation has often proven to be more confusing 
than clarifying and has not as yet become particularly popular.) 
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We now describe certain special types of functions: one-to-one functions, onto 
functions, and invertible functions. 


Definition 3.3.17 (One-to-one functions). A function f is one-to-one (or injective) 
if different elements map to different elements: 


x AexX = > FX) Af). 
Equivalently, a function is one-to-one if 


ti 


f@M=fe) = xan. 


Example 3.3.18 (Informal) The function f: Z—> Z defined by f(n) := n? is not 
one-to-one because the distinct elements —1, | map to the same element 1. On the 
other hand, if we restrict this function to the natural numbers, defining the function 
g: N— Zby g(n) :=n’, then g is now a one-to-one function. Thus the notion of 
a one-to-one function depends not just on what the function does, but also what its 
domain is. 


Remark 3.3.19 Ifafunction f: X — Y is not one-to-one, then one can find distinct 
x and x’ in the domain X such that f(x) = f(x’), thus one can find two inputs which 
map to one output. Because of this, we say that f is two-to-one instead of one-to-one. 


Definition 3.3.20 (Onto functions). A function f is onto (or surjective) if every 
element in Y comes from applying f to some element in X: 


For every y € Y, there exists x € X such that f(x) = y. 


Example 3.3.21 (Informal) The function f: Z— Z defined by f(n) := n? is not 
onto because the negative numbers are not in the image of f. However, if we restrict 
the codomain Z to the set A := {n?:n € Z} of square numbers, then the function 
g: Z— A defined by g(n) := n? is now onto. Thus the notion of an onto function 
depends not just on what the function does, but also what its range is. 


Remark 3.3.22 The concepts of injectivity and surjectivity are in many ways dual 
to each other; see Exercises 3.3.2, 3.3.4, 3.3.5 for some evidence of this. 


Definition 3.3.23 (Bijective functions). Functions f: X — Y which are both one- 
to-one and onto are also called bijective or invertible. 


Example 3.3.24 Let f: {0,1,2} — {3,4} be the function f(0) := 3, f(1) :=3, 
f (2) := 4. This function is not bijective because if we set y = 3, then there is more 
than one x in {0, 1, 2} such that f(x) = y (this is a failure of injectivity). Now let 
g: {0, 1} — {2, 3, 4} be the function g(0) := 2, g(1) := 3; then g is not bijective 
because if we set y = 4, then there is no x for which g(x) = y (this is a failure of 
surjectivity). Now let i: {0, 1,2} > {3, 4, 5} be the function h(O) := 3, A(1) := 4, 
h(2) := 5. Then h is bijective, because each of the elements 3, 4, 5 comes from 
exactly one element from 0, 1, 2. 
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Example 3.3.25 The function f: N — N\{0} defined by f(m) := n++ is a bijec- 
tion (in fact, this fact is simply restating Lemma 2.2.10). On the other hand, the 
function g: N — N defined by the same definition g(n) := n-++ is not a bijection. 
Thus the notion of a bijective function depends not just on what the function does, 
but also what its domain and codomain are. 


Remark 3.3.26 If a function x + f(x) is bijective, then we sometimes call f a 
perfect matching or a one-to-one correspondence (not to be confused with the notion 
of a one-to-one function) and denote the action of f using the notation x > f(x) 
instead of x +» f(x). Thus for instance the function / in the above example is the 
one-to-one correspondence 0 + 3,1< 4,2 <5. 


Remark 3.3.27 A common error is to say that a function f: X — Y is bijective iff 
“for every x in X, there is exactly one y in Y such that y = f(x)”. This is not what it 
means for f to be bijective; rather, this is merely stating what it means for f to bea 
function. A function cannot map one element to two different elements, for instance 
one cannot have a function f for which f (0) = 1 and also f(O) = 2. The functions 
Jf. g given in Example 3.3.25 are not bijective, but they are still functions, since each 
input still gives exactly one output. 


If f is bijective, then for every y € Y, there is exactly one x such that f(x) = y 
(there is at least one because of surjectivity, and at most one because of injectivity). 
This value of x is denoted f~!(y); thus f~! is a function from Y to X. We call f~! 
the inverse of f. 


— Exercises — 


Exercise 3.3.1 Show that the definition of equality in Definition 3.3.8 is reflexive, symmetric, and 
transitive. Also verify the substitution property: if f, f: X > Y and g,g: Y > Z are functions 
such that f = f andg =g,thengof=go f . (Of course, these statements are immediate from 
the axioms of equality in Appendix A.7 applied directly to the functions in question, but the point of 
the exercise is to show that they can also be established by instead applying the axioms of equality 
to elements of the domain and codomain of these functions, rather than to the functions itself.) 


Exercise 3.3.2 Let f: X — Yandg: Y — Zbe functions. Show that if f and g are both injective, 
then so is go f; similarly, show that if f and g are both surjective, then so is g o f. 


Exercise 3.3.3 When is the empty function into a given set X injective? surjective? bijective? 


Exercise 3.3.4 In this section we give some cancellation laws for composition. Let f: X — Y, 
f: X > Y,g: Y > Z,andg: Y — Z be functions. Show that if go f = go f and gis injective, 
then f = f. . Is the same statement true if g is not injective? Show that if go f = go f and f is 
surjective, then g = g. Is the same statement true if f is not surjective? 


Exercise 3.3.5 Let f: X — Y and g: Y — Z be functions. Show that if g o f is injective, then 
f must be injective. Is it true that g must also be injective? Show that if g o f is surjective, then g 
must be surjective. Is it true that f must also be surjective? 


Exercise 3.3.6 Let f: X — Y bea bijective function, and let f—! : Y > X be its inverse. Verify 
the cancellation laws f—!(f (x)) = x for all x € X and f(f—!(y)) = y for all y € Y. Conclude 
that f—! is also invertible and has f as its inverse (thus (f—!)~! = f). 
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Exercise 3.3.7 Let f: X — Y and g: Y — Z be functions. Show that if f and g are bijective, 
then so is go f, and we have (go f)7! = f~!og!, 


Exercise 3.3.8 If X isa subset of Y, letrx_,y : X — Y be the inclusion map from X to Y , defined 
by mapping x +> x forall x € X,ie., ty_,y(x) := x for all x € X. The map vy_, y is in particular 
called the identity map on X. 


(a) Show that if X C Y C Z then ty_,z oly.y = ly-4z. 

(b) Show that if f: A — B is any function, then f = f ol4.4 =lpspof. 

(c) Show that, if f: A — B isa bijective function, then f o ft = 1p-+5p and go> of =lasa. 

(d) Show that if X and Y are disjoint sets, and f: X — Zand g: Y — Z are functions, then there 
is a unique function h: X UY — Z such that h oty_,xuy = f andh oty_,xuy = g. 

(e) Show that the hypothesis that X and Y are disjoint can be dropped in (d) if one adds the 
additional hypothesis that f(x) = g(x) forallx e XNY. 


3.4 Images and Inverse Images 


We know that a function f: X — Y from a set X to a set Y can take individual 
elements x € X to elements f(x) € Y. Functions can also take subsets in X to 
subsets in Y: 


Definition 3.4.1 (/mages of sets). If f: X — Y isa function from X to Y, and S is 
a subset of X, we define* F (S) to be the set 


F(S) = (f(@) sx € S}; 


this set is a subset of Y and is sometimes called the image of S under the map f. We 
sometimes call f(S) the forward image of S to distinguish it from the concept of the 
inverse image f~'(S) of S, which is defined below. 


Note that the set f (S) is well-defined thanks to the axiom of replacement (Axiom 
3.7). One can also define f(S) using the axiom of specification (Axiom 3.6) instead 
of replacement, but we leave this as an exercise to the reader. The image f (X) of 
the domain is also known as the range of the function f: X — Y; it is a subset of 
the codomain Y. 


Example 3.4.2 If f: N—N is the map f(x) = 2x, then the forward image of 
{1, 2, 3} is {2, 4, 6}: 
F({1, 2, 3}) = {2, 4, 6}. 


More informally, to compute f(S'), we take every element x of S and apply f to 
each element individually, and then put all the resulting objects together to form a 
new set. 


4 In principle this notation could collide with the existing notation f (x) for the evaluation of f at x, 
if S turns out to both be a subset of X and an element of X. However, we will ignore this potential 
collision as it rarely occurs in practice. 
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In the above example, the image had the same size as the original set. But some- 
times the image can be smaller, because f is not one-to-one (see Definition 3.3.17): 


Example 3.4.3 (Informal) Let Z be the set of integers (which we will define rigor- 
ously in the next section) and let f: Z— Z be the map f(x) = x’, then 


f(-1, 0, 1, 2}) = {0, 1, 4}. 


Note that f is not one-to-one because f(—1) = f(1). 


Note that 
xeS => fixe f(S) 


but in general 
fame fS)#xeES; 


for instance in the above informal example, f(—2) lies in the set f({—1, 0, 1, 2}), 
but —2 is not in {—1, 0, 1, 2}. The correct statement is 


ye f(S) = y= f(x) forsomex e S$ 


(why?). 


Example 3.4.4. From Definition 3.3.20 we see that a function f: X — Y is onto if 
and only if f(X) = Y. 


Definition 3.4.5 (Inverse images) If U is a subset of Y, we define the set f~'(U) 
to be the set 
f 1) := {x EX: f(x) € U}. 


In other words, f~!(U) consists of all the elements of X which map into U: 
FQyeU 4s xe f UW). 


We call f~!(U) the inverse image of U. 


Example 3.4.6 If f: N — N is the map f(x) = 2x, then f({1, 2, 3}) = {2, 4, 6}, 
but f—'({1, 2, 3}) = {1}. Thus the forward image of {1, 2,3} and the backwards 
image of {1, 2, 3} are quite different sets. Also note that 


FF", 2, 3p) F {1 2, 3} 
(why?). 
Example 3.4.7 (Informal) If f: Z — Z is the map f(x) = x’, then 


f'd0, 1,4) = {—2, -1, 0, 1, 2}. 
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Note that f does not have to be invertible in order for f~!(U) to make sense. Also 
note that images and inverse images do not quite invert each other, for instance we 
have 


f-'(fC-1, 0, 1, 2)) 4 {-1, 0, 1, 2} 


(why?). 


Remark 3.4.8 If f is a bijective function, then we have defined f~! in two slightly 
different ways, but this is not an issue because the two definitions agree in this case 
(Exercise 3.4.1). 


As remarked earlier, functions are not necessarily sets. However, we do consider 
functions to be a type of object, and in particular we should be able to consider sets 
of functions. In particular, we should be able to consider the set of all functions from 
a set X toa set Y. To do this we need to introduce another axiom to set theory: 


Axiom 3.11 (Power set axiom). Let X and Y be sets. Then there exists a set, denoted 
Y*, which consists of all the functions from X to Y, thus 


fey X => ( f is a function with domain X and codomain Y). 


Example 3.4.9 Let X = {4,7} and Y = {0, 1}. Then the set Y* consists of four 
functions: the function that maps 4 +> 0 and 7 +> 0; the function that maps 4+> 0 
and 7+» 1; the function that maps 4+» | and 7+» 0; and the function that maps 
4++ land 7+> 1. The reason we use the notation Y* to denote this set is that if Y 
has n elements and X has m elements, then one can show that Y* has n’” elements; 
see Proposition 3.6.14(f). 


One consequence of this axiom is 


Lemma 3.4.10 Let X be a set. Then the set 
{Y : Y is a subset of X} 
is a set. That is to say, there exists a set Z such that 
YeZosvyYcx 


for all objects Y. 
Proof See Exercise 3.4.6. 


Remark 3.4.11 The set {Y : Y is a subset of X} is known as the power set of X and 
is denoted 2*. For instance, if a, b, c are distinct objects, we have 


2'a-c} — {H, {a}, {b}, {ch}, {a, d}, {a, c}, {b, ch, {a, B, ch}. 


Note that while {a, b, c} has 3 elements, 2{4.6.c} has 23 = 8 elements. This gives a 
hint as to why we refer to the power set of X as 2*; we return to this issue in Chap. 8. 
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For sake of completeness, let us now add one further axiom to our set theory, 
in which we enhance the axiom of pairwise union to allow unions of much larger 
collections of sets. 


Axiom 3.12 (Union). Let A be a set, all of whose elements are themselves sets. 
Then there exists a set |) A whose elements are precisely those objects which are 
elements of the elements of A, thus for all objects x 


xe(JA <=> (x € S forsome S € A). 


Example 3.4.12 If A = {{2, 3}, (3, 4}, {4, 5}, then LJ A = {2, 3, 4, 5} (why?). 


The axiom of union, combined with the axiom of pair set, implies the axiom of 
pairwise union (Exercise 3.4.8). Another important consequence of this axiom is that 
if one has some set 7, and for every element a € J we have some set Ag, then we 


can form the union set (),,.; Aw by defining 


J Aa =| J{Aa :@ € J}, 


ael 


which is a set thanks to the axiom of replacement and the axiom of union. Thus 
for instance, if J = {1, 2,3}, A; := {2,3}, Ao := {3,4}, and A3 := {4,5}, then 
oett.2.3) Aq = {2, 3,4, 5}. More generally, we see that for any object y, 


ye|JAe <> (9 € A, forsomea € /). (3.2) 


ael 


In situations like this, we often refer to J as an index set, and the elements a of this 
index set as labels; the sets Ay are then called a family of sets and are indexed by the 
labels a € I. Note that if J was empty, then |),., Aq would automatically also be 
empty (why’?). 

We can similarly form intersections of families of sets, as long as the index set is 
non-empty. More specifically, given any non-empty set 7, and given an assignment 
of a set A, to each w € J, we can define the intersection (|, Aq by first choosing 
some element 6 of J (which we can do since J is non-empty), and setting 


ael 


() Ag = {x € Ag: x € Ag foralla € I}, (3.3) 


ael 


which is a set by the axiom of specification. This definition may look like it depends 
on the choice of 8, but it does not (Exercise 3.4.9). Observe that for any object y, 


ye( )Aa => (y € Ag for alla € 1) (3.4) 


ael 


(compare with (3.2)). 
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Remark 3.4.13 The axioms of set theory that we have introduced (Axioms 3.1 
and 3.12, excluding the dangerous Axiom 3.9) are known as the Zermelo—Fraenkel 
axioms of set theory,’ after Ernest Zermelo (1871-1953) and Abraham Fraenkel 
(1891-1965). There is one further axiom we will eventually need, the famous axiom 
of choice (see Sect. 8.4), giving rise to the Zermelo—Fraenkel—Choice (ZFC) axioms 
of set theory, but we will not need this axiom for some time. 


— Exercises — 


Exercise 3.4.1 Let f: X — Y bea bijective function, and let f—! : Y > X be its inverse. Let V 
be any subset of Y. Prove that the forward image of V under f—! is the same set as the inverse image 
of V under /; thus the fact that both sets are denoted by f~!(V) will not lead to any inconsistency. 


Exercise 3.4.2 Let f: X — Y bea function from one set X to another set Y, let S be a subset of 
X, and let U be a subset of Y. 


(i) What, in general, can one say about f ra Ff (S)) and S$? 
(ii) What about f(f—!(U)) and U? 
(iii) What about f—!(f(f—!(U))) and f7!(U)? 


Exercise 3.4.3 Let A, B be two subsets of a set X, and let f: X — Y be a function. Show that 
F(AN B) © f(A) N f(B), that f(A)\ f(B) © f(A\B), f(A U B) = f(A) U f(B). For the first 
two statements, is it true that the C relation can be improved to =? 


Exercise 3.4.4 Let f: X — Y bea function from one set X to another set Y, and let U, V be subsets 
of Y. Show that f—!(U UV) = f—'(U) U f7!(V), that fF 1(U NV) = f-1(U) N f7!(V), and 
that f~'(U\V) = f-'(U)\f-'(V). 


Exercise 3.4.5 Let f: X — Y be a function from one set X to another set Y. Show that 
f(f7"(S8)) = S for every S C Y if and only if f is surjective. Show that f (fF (S)) = S for 
every S C X if and only if f is injective. 


Exercise 3.4.6 


(i) Prove Lemma 3.4.10. (Hint: start with the set {0, 1}* and apply the replacement axiom, replac- 
ing each function f with the object fo} ({1}).) See also Exercise 3.5.11. 

(ii) Conversely, show that Axiom 3.11 can be deduced the preceding axioms of set theory if one 
accepts Lemma 3.4.10 as an axiom. (This may help explain why we refer to Axiom 3.11 as 
the “power set axiom’”.) 


Exercise 3.4.7 Let X, Y be sets. Define a partial function from X to Y to be any function f: X’ > 
Y’ whose domain X’ is a subset of X, and whose codomain Y’ is a subset of Y. Show that the 
collection of all partial functions from X to Y is itself a set. (Hint: use Exercise 3.4.6, the power set 
axiom, the replacement axiom, and the union axiom.) 


Exercise 3.4.8 Show that Axiom 3.5 can be deduced from Axiom 3.1, Axiom 3.4, and Axiom 3.12. 


Exercise 3.4.9 Show that if B and f’ are two elements of a set J, and to eacha € J we assign a set 
Aq, then 
{x € Ag: x € Ag for alla € I} = {x € Ag : x € Ag foralla € J}, 


and so the definition of (),<; Aw defined in (3.3) does not depend on f. Also explain why (3.4) is 
true. 


> These axioms are formulated slightly differently in other texts, but all the formulations can be 
shown to be equivalent to each other. 
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Exercise 3.4.10 Suppose that J and J are two sets, and for all a ¢ J U J let Ag be a set. Show 
that (Ue, Aw) U (Upey Aw) = Users Aw: If J and J are non-empty, show that ((),<,; Aw) 9 


(Mees Aa) = Neerus Aa: 


Exercise 3.4.11 Let X be aset, let J be a non-empty set, and for all a € J let Ag be a subset of X. 


Show that 
X\ LU Ae = ( )(X\Aa) 
ael ael 
and 
X\ (Aa = U(X \40). 
ael ael 


This should be compared with De Morgan’s laws in Proposition 3.1.27 (although one cannot derive 
the above identities directly from De Morgan’s laws, as J could be infinite). 


3.5 Cartesian Products 


In addition to the basic operations of union, intersection, and differencing, another 
fundamental operation on sets is that of the Cartesian product. To define this notion, 
we first need the concept of an ordered pair. 


Definition 3.5.1 (Ordered pair). If x and y are any objects (possibly equal), we 
define the ordered pair (x, y) to be a new object, consisting of x as its first component 
and y as its second component. Two ordered pairs (x, y) and (x’, y’) are considered 
equal if and only if both their components match, i.e., 


.y)= Oy) => =x’ andy=y’). (3.5) 


This notion of equality is consistent with the usual axioms of equality (Exercise 
3.5.3). Thus for instance, the pair (3,5) is equal to the pair (2+ 1,3-+ 2), but is 
distinct from the pairs (5,3), (3, 3), and (2, 5). (This is in contrast to sets, where 
{3, 5} and {5, 3} are equal.) 


Remark 3.5.2 Strictly speaking, this definition is partly an axiom, because we have 
simply postulated that given any two objects x and y, that an object of the form (x, y) 
exists. However, it is possible to define an ordered pair using the axioms of set theory 
in such a way that we do not need any further postulates (see Exercise 3.5.1). 


Remark 3.5.3, We have now “overloaded” the parenthesis symbols () once again; 
they now are not only used to denote grouping of operators and arguments of func- 
tions, but also to enclose ordered pairs. This is usually not a problem in practice as 
one can still determine what usage the symbols () were intended for from context. 


Definition 3.5.4 (Cartesian product). If X and Y are sets, then we define the Carte- 
sian product X x Y to be the collection of ordered pairs, whose first component lies 
in X and second component lies in Y, thus 
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XxY={(x,y):xeX,yeY} 
or equivalently 
ae(xXxY) = (a= (4, y) forsomex € X andy eY). 


One can show that the Cartesian product X x Y is in fact a set; see Exercise 3.5.1. 


Example 3.5.5 If X := {1,2} and Y := {3, 4, 5}, then 
Xx Y = {(1, 3), 0,4), (1, 5), @, 3), 2,4), 2, 5)} 


and 
Yx X = {@G, 1), 4, 1), 6, 1), G, 2), (4, 2), (5, 2)}. 


Thus, strictly speaking, X x Y and Y x X are different sets, although they are very 
similar. For instance, they always have the same number of elements (Exercise 3.6.5). 


Let f: X x Y — Z bea function whose domain X x Y is a Cartesian product of 
two other sets X and Y. Then f can either be thought of as a function of one variable, 
mapping the single input of an ordered pair (x, y) in X x Y to anoutput® f(x, y) in 
Z, or as a function of two variables, mapping an input x € X and another input y € Y 
to a single output f(x, y) in Z. While the two notions are technically different, we 
will not bother to distinguish the two, and think of f simultaneously as a function of 
one variable with domain X x Y and as a function of two variables with domains X 
and Y. Thus for instance the addition operation + on the natural numbers can now 
be re-interpreted as a function +: N x N — N, defined by (x, y) > x+y. 

Once one has the notion of an ordered pair, one can also define an ordered triple 
(x, y, z) of three objects (x, y, z) by the formula (x, y, z) := ((x, y), z). One could 
continue in this fashion and define ordered quadruples, etc., but we shall instead use 
a different construction to build ordered n-tuples: 


Definition 3.5.6 (Ordered n-tuple and n-fold Cartesian product). Let n be a natural 
number. An ordered n-tuple (x;)1<j<n (also denoted (x1, ..., X,)) is a collection of 
objects x;, one for every natural number i between | and n; we refer to x; as the 
ith component of the ordered n-tuple. Two ordered n-tuples (x;)1<i<n and (yj) 1<i<n 
are said to be equal iff x; = y; for all 1 <i <n. If (Xi)1<i<, is an ordered n- 
tuple of sets, we define their Cartesian product |], ~;—,, X; (also denoted []/_, X; or 
XxX... X,) by _ 


[| Xi = {@iisien tx © X; for all 1 <i <n}. 


l<i<n 


© Here (and in the rest of this text) we adopt the very common practice of abbreviating f((x, y)) as 


f@, y). 
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Again, this definition simply postulates that an ordered n-tuple and a Cartesian 
product always exist when needed, but using the axioms of set theory one can explic- 
itly construct these objects; see Exercise 3.5.2. 


Remark 3.5.7 One can generalize this construction to infinite Cartesian products; 
see Definition 8.4.1. 


Example 3.5.8 Let a), bi, az, b2, a3, b3 be objects, and let X; := {a,, bj}, Xo := 
{a>, bo}, and X3 := {a3, b3}. Then we have 


X| x X27 xX X3 = {(a1, a2, a), (1, a2, 53), (a1, bo, a3), (1, bo, 53), 
(bi, 42, 43), (D1, a2, b3), (b1, b2, a3), (b1, b2, b3)} 
(X; x X2) x X3 = {((a1, a2), a3), (G1, 42), 53), (a1, bz), a3), (a1, b2), 3), 
((b1, a2), a3), (D1, a2), bs), (1, b2), a3), (hi, b2), b3)} 
X| x (X2 x X3) = (a, (@, 43)), (G1, (a2, 53)), (a1, (b2, a3)), (a1, (br, 53)), 
(b1, (az, 43)), (Bi, (42, 3), (b1, (b2, 43)), (B1, (b2, 53))}. 


Thus, strictly speaking, the sets X; x X2 x X3, (X; x X2) x X3, and X1 x (X2 x 
X3) are distinct. However, they are clearly very related to each other (for instance, 
there are obvious bijections between any two of the three sets), and it is common 
in practice to neglect the minor distinctions between these sets and pretend that 
they are in fact equal. Thus a function f: X; x X2 x X3 — Y can be thought of 
as a function of one variable (x, x2, x3) € X; x X2 x X3, or as a function of three 
variables x; € X,, x2 € X2, x3 € X3, or as a function of two variables x; € X1, 
(x2, x3) € X2 x X3, and so forth; we will not bother to distinguish between these 
different perspectives. 


Remark 3.5.9 An ordered n-tuple (x, ..., Xx») of objects is also called an ordered 
sequence of n elements, or a finite sequence for short. In Chap. 5 we shall also 
introduce the very useful concept of an infinite sequence. 


Example 3.5.10 Wf x is an object, then (x) is a 1-tuple, which we shall identify with 
x itself (even though the two are, strictly speaking, not the same object). Then if X, 
is any set, then the Cartesian product [],—;—, Xi is just X; (why?). Also, the empty 
Cartesian product ||, ~; <9 Xi gives, not the empty set {}, but rather the singleton set 
{(Q} whose only element is the 0-tuple (), also known as the empty tuple. 


If n is a natural number, we often write X” as shorthand for the n-fold Cartesian 
product X” := [],-;-,, X. Thus X! is essentially the same set as X (if we ignore 
the distinction between an object x and the 1-tuple (x)), while X? is essentially the 
Cartesian product X x X. The set X %isa singleton set {()} (why?). 

We can now generalize the single choice lemma (Lemma 3.1.5) to allow for 
multiple (but finite) number of choices. 


Lemma 3.5.11 (Finite choice). Letn > | be anatural number, and for each natural 
number | <i <n, let X; be anon-empty set. Then there exists an n-tuple (X;)1<i<n 
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such that x; € X; forall1 <i <n. Inother words, if each X; is non-empty, then the 
set |], <)<, Xi is also non-empty. 


Proof We induct on n (starting with the base case n = 1; the claim is also vacuously 
true with n = 0 but is not particularly interesting in that case). When n = | the 
claim follows from Lemma 3.1.5 (why?). Now suppose inductively that the claim 
has already been proven for some 7; we will now prove it forn-++. Let X1,..., Xn44 
be a collection of non-empty sets. By induction hypothesis, we can find an n-tuple 
(xi)i<i<n Such that x; € X; for all 1 <i <n. Also, since X,4, is non-empty, by 
Lemma 3.1.5 we may find an object a such thata € X,,4,. If we thus define the n++-- 
tuple (y;)1<i<n44 by setting y; := x; when 1 <i <n and y; := a wheni =n-++ it 
is clear that y; € X; for all 1 < i < n++, thus closing the induction. 


Remark 3.5.12 It is intuitively plausible that this lemma should be extended to allow 
for an infinite number of choices, but this cannot be done automatically; it requires 
an additional axiom, the axiom of choice. See Section 8.4. 


— Exercises — 


Exercise 3.5.1 (i) Suppose we define the ordered pair (x, y) for any objects x and y by the formula 
(x, y) := {{x}, {x, y}} (thus using several applications of Axiom 3.4). Thus for instance (1, 2) 
is the set {{1}, {1, 2}}, (2, 1) is the set {{2}, {2, 1}}, and (1, 1) is the set {{1}}. Show that such a 
definition (known as the Kuratowski definition of an ordered pair) indeed obeys the property 
(3.5). 

(ii) Suppose we instead define an ordered pair using the alternate definition (x, y) := {x, {x, y}}. 
Show that this definition (known as the short definition of an ordered pair) also verifies (3.5) 
and is thus also an acceptable definition of ordered pair. (Warning: this is tricky; one needs the 
axiom of regularity, and in particular Exercise 3.2.2.) 

(iii) Show that regardless of the definition of ordered pair, the Cartesian product X x Y of any two 
sets X, Y is again a set. (Hint: first use the axiom of replacement to show that for any x € X, 
that {(x, y) : y € Y} is a set, and then apply the axiom of union.) 


Exercise 3.5.2 Suppose we define’ an ordered n-tuple to be a surjective function x: {i € N: 1 < 
i <n} — X whose codomain is some arbitrary set X (so different ordered n-tuples are allowed to 
have different ranges); we then write x; for x(7) and also write x as (x; )1<i<n. Using this definition, 
verify that we have (x;)1<j<n = (Vi)1<i<n if and only if x; = y; for all 1 <i <n. Also, show that 
if (Xi)1<i<n are an ordered n-tuple of sets, then the Cartesian product, as defined in Definition 
3.5.6, is indeed a set. (Hint: use Exercise 3.4.7 and the axiom of specification.) 


Exercise 3.5.3 Show that the definitions of equality for ordered pair and ordered n-tuple are con- 
sistent with the reflexivity, symmetry, and transitivity axioms, in the sense that if these axioms are 
assumed to hold for the individual components x, y of an ordered pair (x, y), then they hold for the 
ordered pair itself. 


Exercise 3.5.4 Let A, B, C be sets. Show that A x (BUC) = (A x B) U(A x C), that A x (BN 
C) = (A x B)N(A x C), and that A x (B\C) = (A x B)\(A x C). (One can of course prove 
similar identities in which the réles of the left and right factors of the Cartesian product are reversed.) 


7 Technically, this construction of ordered n-tuple is not compatible with the constructions of ordered 
pairs in Exercise 3.5.1, but this does not cause a difficulty in practice; for instance, one can use the 
definition of an ordered 2-tuple here to replace the construction in Exercise 3.5.1, or one can make 
a rather pedantic distinction between an ordered 2-tuple and an ordered pair in one’s mathematical 
arguments. 
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Exercise 3.5.5 Let A, B, C, Dbe sets. Show that (A x B)N(C x D) = (ANC) x (BND). Isit 
true that (A x B)U(C x D) = (AUC) x (BU D)? Isit true that (A x B)\(C x D) = (A\C) x 
(B\D)? 


Exercise 3.5.6 Let A, B, C, Dbenon-empty sets. Show that A x B C C x Difandonlyif A C C 
and B C D, and that A x B = C x D if and only if A = C and B = D. What happens if some or 
all of the hypotheses that the A, B, C, D are non-empty are removed? 


Exercise 3.5.7 Let X, Y be sets, and let tyyy5y: X x Y > X andayyy_.y: X x Y> Y be 
the maps wy xy+x(x, y) := x and ryxy-—y(x, y) := y; these maps are known as the co-ordinate 
functions on X x Y. Show that for any functions f: Z — X and g: Z — Y, there exists a unique 
function h: Z — X x Y such that wy,yx oh = f and axxy-y oh = g. (Compare this to the 
last part of Exercise 3.3.8, and to Exercise 3.1.7.) This function h is known as the pairing of f and 
g and is denoted h = (f, g). 


Exercise 3.5.8 Let X1,...,X, be sets. Show that the Cartesian product []'_, X; is empty if and 
only if at least one of the X; is empty. 


Exercise 3.5.9 Suppose that J and J are two sets, and for alla € J let Ay be aset, and forall B € J 
let Bg bea set. Show that (yey Aw) (Uses Bg) = Uwe pyerxs Aw M Bg). What happens if one 
interchanges all the union and intersection symbols here? 


Exercise 3.5.10 If f: X — Y isa function, define the graph of f to be the subset of X x Y defined 
by {(x, f@)) : x € X}. 


(i) Show that two functions f: X + Y, f: X — Y are equal if and only if they have the same 

graph. 

(ii) Conversely, if G is any subset of X x Y with the property that foreach x € X,theset{y ¢ Y: 
(x, y) € G} has exactly one element (or in other words, G obeys the vertical line test), show 
that there is exactly one function f: X — Y whose graph is equal to G. 

(ii1) Suppose we define® a function f to be an ordered triple f = (X, Y, G), where X, Y are sets, 
and G is a subset of X x Y that obeys the vertical line test. We then define the domain of such 
a triple to be X, the codomain to be Y and for every x € X, we define f(x) to be the unique 
y € Y such that (x, y) € G. Show that this definition is compatible with Definition 3.3.1 in the 
sense that every choice of domain X, codomain Y, and property P(x, y) obeying the vertical 
line test produces a function as defined here that obeys all the properties required of it in that 
definition, and is also similarly compatible with Definition 3.3.8. 


Exercise 3.5.11 Show that Axiom 3.11 can in fact be deduced from Lemma 3.4.10 and the other 
axioms of set theory, and thus Lemma 3.4.10 can be used as an alternate formulation of the power 
set axiom. (Hint: for any two sets X and Y, use Lemma 3.4.10 and the axiom of specification to 
construct the set of all subsets of X x Y which obey the vertical line test. Then use Exercise 3.5.10 
and the axiom of replacement.) 


Exercise 3.5.12 This exercise will establish a rigorous version of Proposition 2.1.16 that avoids 
circularity (in particular, avoiding the use of any object that required Proposition 2.1.16 to construct). 


(i) Let X be aset, let f: N x X — X bea function, and let c be an element of X. Show that there 
exists a function a: X — X such that 
a(0)=c 
and 
a(n++) = f(n, a(n)) foralln EN, 


8 Note that this definition is not circular, because the notion of a function was not used to define 
ordered triples or a Cartesian product of two sets. 
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and furthermore that this function is unique. (Hint: first show inductively, by a modification 
of the proof of Lemma 3.5.11, that for every natural number N € N, there exists a unique 
function ay: {n € N:n < N} > X such that ay(0) = c and an(n++) = f(n, an (n)) for 
all n € N such that n < N.) 

(ii) (Warning: this is challenging.) Prove (i) without using any properties of the natural numbers 
other than the Peano axioms directly (in particular, without using the ordering of the natural 
numbers, and without appealing to Proposition 2.1.16). (Hint: first show inductively, using 
only the Peano axioms and basic set theory, that for every natural number N € N, there exists 
a unique pair Ay, By of subsets of N which obeys the following properties: (a) Ay N By = 9, 
(b) Ay U By = N, (c)0 € Ay, (d) N+4 € By, (e) Whenever n € By, we have n++ € By. 
(f) Whenever n € Ay andn 4 N, we have n++ € Ay. Once one obtains these sets, use Ay 
as a substitute for {n ¢ N:n < N} in the previous argument.) 


Exercise 3.5.13 The purpose of this exercise is to show that there is essentially only one version 
of the natural number system in set theory (cf. the discussion in Remark 2.1.12). Suppose we have 
a set N’ of “alternative natural numbers”, an “alternative zero” 0’, and an “alternative increment 
operation” which takes any alternative natural number n’ € N’ and returns another alternative natural 
number n’-++’ € N’, such that the Peano axioms (Axioms 2.1-2.5) all hold with the natural numbers, 
zero, and increment replaced by their alternative counterparts. Show that there exists a bijection 
f:N—N’' from the natural numbers to the alternative natural numbers such that f (0) = 0’, and 
such that for any n € N and n’ EN’, we have f(n) =n’ if and only if f(n++) = n/-’. (Aint: 
use Exercise 3.5.12.) 
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In the previous chapter we defined the natural numbers axiomatically, assuming 
that they were equipped with a O and an increment operation, and assuming five 
axioms on these numbers. Philosophically, this is quite different from one of our main 
conceptualizations of natural numbers—that of cardinality, or measuring how many 
elements there are in a set. Indeed, the Peano axiom approach treats natural numbers 
more like ordinals than cardinals. (The cardinals are One, Two, Three, ..., and are 
used to count how many things there are in a set. The ordinals are First, Second, 
Third, ..., and are used to order a sequence of objects. There is a subtle difference 
between the two, especially when comparing infinite cardinals with infinite ordinals, 
but this is beyond the scope of this text.) We paid a lot of attention to what number 
came next after a given number n—which is an operation which is quite natural for 
ordinals, but less so for cardinals—but did not address the issue of whether these 
numbers could be used to count sets. The purpose of this section is to address this 
issue by noting that the natural numbers can be used to count the cardinality of sets, 
as long as the set is finite. 

The first thing is to work out when two sets have the same size. For instance, it 
seems clear that the sets {1, 2, 3} and {4, 5, 6} have the same size, but that both have 
a different size from {8, 9}. As an initial attempt to define a notion of size, we could 
try to say that two sets have the same size if they have the same number of elements, 
but we have not yet defined what the “number of elements” in a set is. Besides, this 
runs into problems when a set is infinite. 
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The right way to define the concept of “two sets having the same size” is not 
immediately obvious, but can be worked out with some thought. One intuitive rea- 
son why the sets {1, 2, 3} and {4, 5, 6} have the same size is that one can match the 
elements of the first set with the elements in the second set in a one-to-one correspon- 
dence: 1 ~ 4,2 ~ 5, 3 < 6. (Indeed, this is how we first learn to count a set: we 
correspond the set we are trying to count with another set, such as a set of fingers on 
your hand.) We will use this intuitive understanding as our rigorous basis for “having 
the same size”. 


Definition 3.6.1 (Equal cardinality) We say that two sets X and Y have equal car- 
dinality iff there exists a bijection f: X — Y from X to Y. 


Example 3.6.2 The sets {0, 1, 2} and {3, 4, 5} have equal cardinality, since we can 
find a bijection between the two sets. Note that we do not yet know whether {0, 1, 2} 
and {3, 4} have equal cardinality; we know that one of the functions f from {0, 1, 2} 
to {3, 4} is not a bijection, but we have not proven yet that there might still be some 
other bijection from one set to the other. (It turns out that they do not have equal 
cardinality, but we will prove this a little later.) Note that this definition makes sense 
regardless of whether X is finite or infinite (in fact, we haven’t even defined what 
finite means yet). 


Remark 3.6.3 The fact that two sets have equal cardinality does not preclude one of 
the sets from containing the other. For instance, if X is the set of natural numbers and 
Y is the set of even’ natural numbers, then the map f: X — Y defined by f(n) := 2n 
is a bijection from X to Y (why?), and so X and Y have equal cardinality, despite Y 
being a subset of X and seeming intuitively as if it should only have “half” of the 
elements of X. 


The notion of having equal cardinality is an equivalence relation: 


Proposition 3.6.4 Let X, Y, Z be sets. Then X has equal cardinality with X. If X 
has equal cardinality with Y, then Y has equal cardinality with X. If X has equal 
cardinality with Y and Y has equal cardinality with Z, then X has equal cardinality 
with Z. 


Proof See Exercise 3.6.1. 


Let n be a natural number. Now we want to say when a set X has n elements. 
Certainly we want the set {i eN:1 <i <n}=({1,2,...,n} to have n elements. 
(This is true even when n = 0; the set {i e N: 1 <i < 0} is just the empty set.) 
Using our notion of equal cardinality, we thus define: 


Definition 3.6.5 Let be a natural number. A set X is said to have cardinality n, iff 
it has equal cardinality with {i e N: 1 <i <n}. Wealso say that X has n elements 
iff it has cardinality n. 


9 A natural number is even if it is of the form 2n for some natural number n. 
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Remark 3.6.6 One can use the set {i € N:i <n} instead of {i €e N: 1 <i <n}, 
since these two sets clearly have equal cardinality. (Why? What is the bijection?) 


Example 3.6.7 Leta, b,c, d be distinct objects. Then {a, b, c, d} has the same cardi- 
nality as {i e N:i <4} = {0,1,2, 3}or{ie N: 1 <i < 4} = (1, 2, 3, 4} and thus 
has cardinality 4. Similarly, the set {a} has cardinality 1. 


There might be one problem with this definition: a set might have two different 
cardinalities. But this is not possible: 


Proposition 3.6.8 (Uniqueness of cardinality) Let X be a set with some cardinality 
n. Then X cannot have any other cardinality, i.e., X cannot have cardinality m for 
anym #n. 


Before we prove this proposition, we need a lemma. 


Lemma 3.6.9 Suppose that n > 1, and X has cardinality n. Then X is non-empty, 
and if x is any element of X, then the set X — {x} (i.e., X with the element x removed) 
has cardinality’ n — 1. 


Proof If X is empty then it clearly cannot have the same cardinality as the non- 
empty set {i ¢ N: | <i <n}, as there is no bijection from the empty set to a non- 
empty set (why?). Now let x be an element of X. Since X has the same cardinality 
as {i € N: 1 <i <n}, we thus have a bijection f from X to {i e N: 1 <i <n}. 
In particular, f(x) is a natural number between | and n. Now define the function 
g: X —{x} > {i e N: 1 <i <n -— 1} by the following rule: for any y € X — {x}, 
we define ¢(y) := f(y) if f(y) < f(x), and define g(y) := f(y) — 1 if fO) > 
f(x). (Note that f(y) cannot equal f(x) since y 4 x and f is a bijection.) It is easy 
to check that this map is also a bijection (why?), and so X — {x} has equal cardi- 
nality with {i e N: | <i <n -— 1}. In particular X — {x} has cardinality n — 1, as 
desired. 


Now we prove the proposition. 


Proof of Proposition 3.6.8 We induct on n. First suppose that n = 0. Then X must 
be empty, and so X cannot have any non-zero cardinality. Now suppose that the 
proposition is already proven for some n; we now prove it for n++. Let X have 
cardinality n++; and suppose that X also has some other cardinality m ~ n-+++. 
By Lemma 3.6.9, X is non-empty, and if x is any element of X, then X — {x} 
has cardinality n and also has cardinality m — 1, by Lemma 3.6.9. By induction 
hypothesis, this means thatn = m — 1, whichimplies thatm = n-H, acontradiction. 
This closes the induction. 


!0 Strictly speaking, n — 1 has not yet been defined in this text. For the purposes of this lemma, we 
define n — 1 to be the unique natural number m such that m++ =n; this m is given by Lemma 
2.210. 
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Thus, for instance, we now know, thanks to Propositions 3.6.4 and 3.6.8, that the 
sets {0, 1, 2} and {3, 4} do not have equal cardinality, since the first set has cardinality 
3 and the second set has cardinality 2. 


Definition 3.6.10 (Finite sets). A set is finite iff it has cardinality n for some natural 
number 7; otherwise, the set is called infinite. If X is a finite set, we use #(X) to 
denote the cardinality of X. 


Example 3.6.11 The sets {0, 1, 2} and {3, 4} are finite, as is the empty set (0 is a 
natural number), and #({0, 1, 2}) = 3, #({3, 4}) = 2, and #(@) = 0. 


Now we give an example of an infinite set. 
Theorem 3.6.12 The set of natural numbers N is infinite. 


Proof Suppose for sake of contradiction that the set of natural numbers N was 
finite, so it had some cardinality #(N) = n. By Lemma 3.6.9, N\{0} would then have 
cardinality n — 1. But N has equal cardinality with N\{0} (using x > x + 1 as the 
bijection from the latter to the former), hence n = n — 1, which gives the desired 
contradiction. 


Remark 3.6.13 One can also use similar arguments to show that any unbounded 
set!! is infinite; for instance the rationals Q and the reals R (which we will construct 
in later chapters) are infinite. However, it is possible for some sets to be “more” 
infinite than others; see Sect. 8.3. 


Now we relate cardinality with the arithmetic of natural numbers. 


Proposition 3.6.14 (Cardinal arithmetic). 


(a) Let X be a finite set, and let x be an object which is not an element of X. Then 
X U {x} is finite and #(X U {x}) = #(X) + 1. 

(b) Let X and Y be finite sets. Then X UY is finite and #(X UY) < #(X) + #(Y). 
If in addition X and Y are disjoint (i.e., X NY = @), then#(X UY) = #(X) + 
#(Y). 

(c) Let X be afinite set, and let Y be a subset of X. Then Y is finite, and#(Y) < #(X). 
Ifin addition Y # X (i.e., Y isa proper subset of X), then we have #(Y) < #(X). 

(d) If X is a finite set, and f : X — Y isa function, then f (X) is a finite set with 
#(f (X)) < #(X). One has equality #( f (X)) = #(X) if and only if f is one-to- 
one. 

(e) Let X and Y be finite sets. Then Cartesian product X x Y is finite and #(X x 

Y) = #(X) x #(Y). 

Let X and Y be finite sets. Then the set Y* (defined in Axiom 3.11) is finite and 

#(Y*) = #(Y)*™), 


(f 


wa 


Proof See Exercise 3.6.4. 


'] The notion of a bounded or unbounded set is defined in Definition 9.1.22. 
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Remark 3.6.15 Proposition 3.6.14 suggests that there is another way to define the 
arithmetic operations of natural numbers; not defined recursively as in Definitions 
2.2.1, 2.3.1, 2.3.11, but instead using the notions of union, Cartesian product, and 
power set. This is the basis of cardinal arithmetic, which is an alternative foundation 
to arithmetic than the Peano arithmetic we have developed here; we will not develop 
this arithmetic in this text, but we give some examples of how one would work with 
this arithmetic in Exercises 3.6.5, 3.6.6. 


This concludes our discussion of finite sets. We shall discuss infinite sets in Chap. 
8, once we have constructed a few more examples of infinite sets (such as the integers, 
rationals, and reals). 


— Exercises — 
Exercise 3.6.1 Prove Proposition 3.6.4. 
Exercise 3.6.2. Show that a set X has cardinality 0 if and only if X is the empty set. 


Exercise 3.6.3 Letn be anatural number, and let f: {i e N: 1 <i <n} — Nbea function. Show 
that there exists a natural number M such that f(i) < M forall 1 <i <n. (Hint: induct onn. You 
may also want to peek at Lemma 5.1.14.) Thus finite subsets of the natural numbers are bounded. 
Use this to give an alternate proof of Theorem 3.6.12 that does not use Lemma 3.6.9. 


Exercise 3.6.4 Prove Proposition 3.6.14. 


Exercise 3.6.5 Let A and B be sets. Show that A x B and B x A have equal cardinality by con- 
structing an explicit bijection between the two sets. Then use Proposition 3.6.14 to conclude an 
alternate proof of Lemma 2.3.2. 


Exercise 3.6.6 Let A, B, C be sets. Show that the sets (A?)© and A®*© have equal cardinality by 
constructing an explicit bijection between the two sets. Conclude that (a”)° = a?¢ for any natural 
numbers a, b, c. Use a similar argument to also conclude a’ x ae =alt, 


Exercise 3.6.7 Let A and B be sets. Let us say that A has lesser or equal cardinality to B if there 
exists an injection f: A — B from A to B. Show that if A and B are finite sets, then A has lesser 
or equal cardinality to B if and only if #(A) < #(B). 


Exercise 3.6.8 Let A and B be sets such that there exists an injection f: A > B from A to B (ie., 
A has lesser or equal cardinality to B). Assume also that A is non-empty. Show that there exists a 
surjection g: B — A from B to A. (The converse to this statement requires the axiom of choice; 
see Exercise 8.4.3.) 


Exercise 3.6.9 Let A and B be finite sets. Show that A U B and A / B are also finite sets, and that 
#(A) + #(B) = #(A U B) + #(AN B). 


Exercise 3.6.10 Let A,,..., An be finite sets such that #(Uiett,....n} A;) > n. Show that there 
exists i € {1,...,} such that #(A;) > 2. (This is known as the pigeonhole principle.) 


Exercise 3.6.11 Let f: X — Y bea function between two sets X, Y. Show that the following are 
equivalent: 


(a) /f is injective. 
(b) Whenever E C X has cardinality #(£) equal to 2, then the image f(£) also has cardinality 
#(f(E)) = 2. 
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(Note that if X has cardinality less than 2 then the claim in (b) is vacuously true; nevertheless, the 
equivalence still holds in this case!) Because of this equivalence, one could refer to an injective 
function as a two-to-two function. (This observation is due to John Conway (1937—2020).) 


Exercise 3.6.12 For any natural number n, let S, be the set of all bijections @: {i eN: 1 <i < 
n} > {i €N: 1 <i <n} fromthe set {i e N: 1 <i <n} to itself (such bijections are also known 
as permutations of {i € N: 1 <i <n}. 


(i) For any natural number n, show that S,, is finite, and #(Sp44) = (n++) x #(S,). (Hint: parti- 
tion S,44 into n+4 subsets, depending on the value ¢(n++) a permutation d: {i eN:1< 
i<n+t} > {i ¢eN: 1 <i <n} from the set {i eN: 1 <i <n++} assigns ton++. 

(ii) Define the factorial n! of a natural number n recursively by 0! := 1 and (n++)! := (n-+++) x n! 
for all natural numbers n. Show that #(S,,) = n! for all natural numbers n. 


Chapter 4 ®) 
Integers and Rationals rie 


4.1 The Integers 


In Chap. 2 we built up most of the basic properties of the natural number system, but 
we have reached the limits of what one can do with just addition and multiplication. 
We would now like to introduce a new operation, that of subtraction, but to do that 
properly we will have to pass from the natural number system to a larger number 
system, that of the integers. 

Informally, the integers are what you can get by subtracting two natural numbers; 
for instance, 3 — 5 should be an integer, as should 6 — 2. This is not a complete def- 
inition of the integers, because (a) it doesn’t say when two differences are equal (for 
instance we should know why 3 — 5 is equal to 2 — 4, but is not equal to 1 — 6), and 
(b) it doesn’t say how to do arithmetic on these differences (how does one add 3 — 5 
to 6 — 2?). Furthermore, (c) this definition is circular because it requires a notion of 
subtraction, which we can only adequately define once the integers are constructed. 
Fortunately, because of our prior experience with integers we know what the answers 
to these questions should be. To answer (a), we know from our advanced knowl- 
edge in algebra that a — b = c — d happens exactly whena + d = c +b, so we can 
characterize equality of differences using only the concept of addition. Similarly, 
to answer (b) we know from algebra that (a — b) + (c —d) = (a+ c) — (b+ 4d) 
and that (a — b)(c — d) = (ac + bd) — (ad + bc). So we will take advantage of 
our foreknowledge by building all this into the defn of the integers, as we shall do 
shortly. 

We still have to resolve (c). To get around this problem we will use the following 
work around: we will temporarily write integers not as a difference a — b, but instead 
use a new notation a —b to define integers, where the — is a meaningless place- 
holder, similar to the comma in the Cartesian co-ordinate notation (x, y) for points 
in the plane. Later when we define subtraction we will see that a — b is in fact equal 
to a — b, and so we can discard the notation —; it is only needed right now to avoid 
circularity. (These devices are similar to the scaffolding used to construct a building; 
they are temporarily essential to make sure the building is built correctly, but once 
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the building is completed they are thrown away and never used again.) This may 
seem unnecessarily complicated in order to define something that we already are 
very familiar with, but we will use this device again to construct the rationals, and 
knowing these kinds of constructions will be very helpful in later chapters. 


Definition 4.1.1 (/ntegers). An integer is an expression! of the form a — b, where a 
and b are natural numbers. Two integers are considered to be equal, a —b = c—d, 
if and only ifa +d =c-+b. We let Z denote the set of all integers. 


Thus for instance 3 — 5 is an integer, and is equal to 2 — 4, because3 + 4 = 2+5. 
On the other hand, 3 —5 is not equal to 2 — 3 because 3 + 3 4 2 + 5. This notation 
is strange looking and has a few deficiencies; for instance, 3 is not yet an integer, 
because it is not of the form a — b! We will rectify these problems later. 

We have to check that this is a legitimate notion of equality. We need to verify 
the reflexivity, symmetry, transitivity, and substitution axioms (see Sect. A.7). We 
leave reflexivity and symmetry to Exercise 4.1.1 and instead verify the transitivity 
axiom. Suppose we know that a—b = c—d and c—d = e— f. Then we have 
a+d=c+bandc+ f=d-+e. Adding the two equations together we obtain 
a+d+c+f=c+b+d-+e. By Proposition 2.2.6 we can cancel the c and d, 
obtaininga + f =b+e,i.e.,a—b = e— f.Thus thecancellation law was needed 
to make sure that our notion of equality is sound. As for the substitution axiom, we 
cannot verify it at this stage because we have not yet defined any operations on the 
integers. However, when we do define our basic operations on the integers, such 
as addition, multiplication, and order, we will have to verify the substitution axiom 
at that time in order to ensure that the definition is valid. (We will only need to 
do this for the basic operations; more advanced operations on the integers, such as 
exponentiation, will be defined in terms of the basic ones, and so we do not need to 
reverify the substitution axiom for the advanced operations.) 

Now we define two basic arithmetic operations on integers: addition and multi- 
plication. 


Definition 4.1.2 The sum of two integers, (a — b) + (c —d), is defined by the for- 
mula 
(a—b) + (c—d) := (a+c)— (b+ a). 


The product of two integers, (a —b) x (c —d), is defined by 


(a—b) x (c—d) := (ac + bd) — (ad + be). 


' In the language of set theory, what we are doing here is starting with the space N x N of ordered 
pairs (a, b) of natural numbers. Then we place an equivalence relation ~ on these pairs by declaring 
(a, b) ~ (c, d) iffa + d = c + b. The set-theoretic interpretation of the symbol a — b is that it is the 
space of all pairs equivalent to (a, b):a—b := {(c,d) € N x N: (a,b) ~ (c, d)}; the existence of 
the set Z = {a—b: (a, b) € N x N} of integers then follows from two applications of the axiom 
of replacement. However, this interpretation plays no role in how we manipulate the integers and 
we will not refer to it again. A similar set-theoretic interpretation can be given to the construction 
of the rational numbers later in this chapter, or the real numbers in the next chapter. 
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Thus for instance, (3 —5) + (1 —A4) is equal to (49). There is however one 
thing we have to check before we can accept these definitions—we have to check 
that if we replace one of the integers by an equal integer, that the sum or product does 
not change. For instance, (3 — 5) is equal to (2— 4), so (3 —5) + (1 —4) ought to 
have the same value as (2 — 4) + (1 —4), otherwise this would not give a consistent 
definition of addition. Fortunately, this is the case: 


Lemma 4.1.3 (Addition and multiplication are well-defined). Let a,b, a’, b', c,d 
be natural numbers. If (a — b) = (a' —b'), then (a—b) + (c—d) = (a’— Bb’) + 
(c—d) and (a—b) x (c—d) = (a’ —b’) x (c—d), and also (ce —d) + (a— 
b) = (c—d) + (a' —b’) and (c —d) x (a—b) = (c—4) x (a' —b’). Thus addi- 


tion and multiplication are well-defined operations (equal inputs give equal outputs). 


Proof To prove that (a —b) + (ec —d) = (a' —b’) + (c—d), we evaluate both 
sides as (a+c)—(b+d) and (a’+c)—(b’ +d). Thus we need to show that 
a+c+b'+d=a'+c+b-+d. But since (a—b) = (a’—Db’), we havea + b! = 
a’ +b, and so by adding c+d to both sides we obtain the claim. Now we 
show that (a —b) x (c—d) = (a’ —b’) x (c—4). Both sides evaluate to (ac + 
bd) —(ad + bc) and (a’c + b'd) —(a'd + b'c), so we have to show that ac + 
bd+a'd+b'’c=a'c+b'd+ad + bc. But the left-hand side factors as c(a + b’) + 
d(a' +b), while the right factors as c(a’ + b) + d(a +b’). Since a+b! =a' +b, 
the two sides are equal. The other two identities are proven similarly. 


The integers n —O behave in the same way as the natural numbers n; indeed 
one can check that (n —0) + (m—0) = (n +m)—0O and (n—0) x (m—0) = 
nm — 0. Furthermore, (7 — (0) is equal to (m —0) if and only ifm = m. (The math- 
ematical term for this is that there is an isomorphism between the natural numbers 
n and those integers of the form n —0.) Thus we may identify the natural numbers 
with integers by setting n = n —0; this does not affect our definitions of addition 
or multiplication or equality since they are consistent with each other. For instance 
the natural number 3 is now considered to be the same as the integer 3 —0, thus 
3 = 3—0. In particular 0 is equal to 0 —0 and 1 is equal to 1 —0. Of course, if we 
set n equal to n —0, then it will also be equal to any other integer which is equal to 
n—O, for instance 3 is equal not only to 3—0, but also to 4— 1, 5 —2, etc. 

We can now define incrementation on the integers by defining x++ :=x+ 1 
for any integer x; this is of course consistent with our definition of the increment 
operation for natural numbers. However, this is no longer an important operation for 
us, as it has been now superceded by the more general notion of addition. 

Now we consider some other basic operations on the integers. 


Definition 4.1.4 (Negation of integers). If (a —b) is an integer, we define the nega- 
tion —(a—b) to be the integer (b —a). In particular if n = n —0 is a positive 
natural number, we can define its negation —n = 0—n. 


For instance —(3 — 5) = (5—3). One can check this definition is well-defined 
(Exercise 4.1.2). 
We can now show that the integers correspond exactly to what we expect. 
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Lemma 4.1.5 (Trichotomy of integers). Let x be an integer. Then exactly one of the 
following three statements is true: (a) x is zero; (b) x is equal to a positive natural 
number n; or (c) x is the negation —n of a positive natural number n. 


Proof We first show that at least one of (a), (b), (c) is true. By definition, x = a—b 
for some natural numbers a, b. We have three cases: a > b, a=b, or a < b. If 
a > bthena = b+ c for some positive natural number c, which means that a — b = 
c—0=c, which is (b). If a = b, then a—b = a—a = 0—0 = 0, which is (a). 
If a < b, then b > a, so that b— a = n for some natural number n by the previous 
reasoning, and thus a — b = —n, which is (c). 

Now we show that no more than one of (a), (b), (c) can hold at a time. By definition, 
a positive natural number is non-zero, so (a) and (b) cannot simultaneously be true. 


If (a) and (c) were simultaneously true, then 0 = —n for some positive natural n; 
thus (0—0) = (0O—n), so that 0+ =0+ 0, so that n = 0, a contradiction. If 
(b) and (c) were simultaneously true, then n = —m for some positive n, m, so that 


(n—0) = (O—™m), so that n+ m =0+0, which contradicts Proposition 2.2.8. 
Thus exactly one of (a), (b), (c) is true for any integer x. 


If n is a positive natural number, we call n a positive integer, and —n a negative 
integer. Thus every integer is positive, zero, or negative, but not more than one of 
these at a time. 

One could well ask why we don’t use Lemma 4.1.5 to define the integers; i.e., why 
didn’t we just say an integer is anything which is either a positive natural number, 
zero, or the negative of a natural number. The reason is that if we did so, the rules for 
adding and multiplying integers would split into many different cases (e.g., negative 
times positive equals positive; negative plus positive is either negative, positive, or 
zero, depending on which term is larger, etc.) and to verify all the properties would 
end up being much messier. 

We now summarize the algebraic properties of the integers. 


Proposition 4.1.6 (Laws of algebra for integers). Let x, y, z be integers. Then we 
have 


X+y=yt+x 
(x+y)+z72=x4+(04+2) 
x+0=0+x=x 
x+(—x) = (-x)+x=0 
xy = yx 
(xy)zZ = x(yz) 
xl=lx=x 
X(y+z)Hxy+xz 
(y+ z)x = yx + 2x. 
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Remark 4.1.7 The above set of nine identities have a name; they are asserting that 
the integers form a commutative ring. (If one deleted the identity xy = yx, then they 
would only assert that the integers form a ring). Note that some of these identities 
were already proven for the natural numbers, but this does not automatically mean 
that they also hold for the integers because the integers are a larger set than the natural 
numbers. On the other hand, this proposition supercedes many of the propositions 
derived earlier for natural numbers. 


Proof There are two ways to prove these identities. One is to use Lemma 4.1.5 and 
split into a lot of cases depending on whether x, y, z are zero, positive, or negative. 
This becomes very messy. A shorter way is to write x = (a —b), y = (c—d), and 
z = (e— f) for some natural numbers a, b,c, d, e, f, and expand these identities 
in terms of a, b,c,d,e, f and use the algebra of the natural numbers. This allows 
each identity to be proven in a few lines. We shall just prove the longest one, namely 
(xy)Z = x(yz): 


(xy)z = ((a—b)(c—d)) (e— f) 

= ((ac + bd) — (ad + bc)) (e— f) 

= ((ace + bde + adf + bcf) —(acf + bdf + ade+ bce)); 
x(yz) = (a—b) ((c—d)(e— f)) 

= (a—b) ((ce + df) —(cf + de)) 

= ((ace + adf + bcf + bde) —(acf + ade + bce + bdf)) 


and so one can see that (xy)z and x(yz) are equal. The other identities are proven in 
a similar fashion; see Exercise 4.1.4. 


We now define the operation of subtraction x — y of two integers by the formula 
x—yr=x+(—y). 


We do not need to verify the substitution axiom for this operation, since we have 
defined subtraction in terms of two other operations on integers, namely addition and 
negation, and we have already verified that those operations are well-defined. 

One can easily check now that if a and b are natural numbers, then 


a—b=a+-—b= (a—0)+ (0—b) =a—b, 


and so a — b is just the same thing as a — b. Because of this we can now discard the 
— notation, and use the familiar. 

We can now generalize Lemma 2.3.3 and Corollary 2.3.7 from the natural numbers 
to the integers: 


Proposition 4.1.8 (Integers have no zero divisors) Let a and b be integers such that 
ab = 0. Then either a = 0 or b = 0 (or both). 
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Proof See Exercise 4.1.5. 


Corollary 4.1.9 (Cancellation law for integers) If a, b, c are integers such that 
ac = be and c is non-zero, then a = b. 


Proof See Exercise 4.1.6. 


We now extend the notion of order, which was defined on the natural numbers, to 
the integers by repeating the definition verbatim: 


Definition 4.1.10 (Ordering of the integers) Let n and m be integers. We say that n 
is greater than or equal to m, and write n > m orm <n, iff we haven = m +a for 
some natural number a. We say that n is strictly greater than m, and write n > m or 
m <n,iffn >mandn Am. 


Thus for instance 5 > —3, because 5 = —3 + 8 and 5 # —3. Clearly this defini- 
tion is consistent with the notion of order on the natural numbers, since we are using 
the same definition. 

Using the laws of algebra in Proposition 4.1.6 it is not hard to show the following 
properties of order: 


Lemma 4.1.11 (Properties of order). Let a, b, c be integers. 


(a) a > bifand only if a — b is a positive natural number. 

(b) (Addition preserves order) Ifa > b, thonna+c>b+te. 

(c) (Positive multiplication preserves order) Ifa > b and c is positive, thenac > be. 
(d) (Negation reverses order) Ifa > b, then —a < —b. 

(e) (Order is transitive) Ifa > band b > c, thena > c. 

(f) (Order trichotomy) Exactly one of the statements a > b, a < b, ora = bis true. 


Proof See Exercise 4.1.7. 


— Exercises — 
Exercise 4.1.1 Verify that the definition of equality on the integers is both reflexive and symmetric. 


Exercise 4.1.2 Show that the definition of negation on the integers is well-defined in the sense that 
if (a—b) = (a’ —D’), then —(a —b) = —(a' —D’) (so equal integers have equal negations). 


Exercise 4.1.3 Show that (—1) x a = —a for every integer a. 


Exercise 4.1.4 Prove the remaining identities in Proposition 4.1.6. (Hint: one can save some work 
by using some identities to prove others. For instance, once you know that xy = yx, you get for 
free that x1 = 1x, and once you also prove x(y + z) = xy + xz, you automatically get (y + z)x = 
yx + zx for free.) 


Exercise 4.1.5 Prove Proposition 4.1.8. (Hint: while this proposition is not quite the same as Lemma 
2.3.3, it is certainly legitimate to use Lemma 2.3.3 in the course of proving Proposition 4.1.8.) 


Exercise 4.1.6 Prove Corollary 4.1.9. (Hint: there are two ways to do this. One is to use Proposition 
4.1.8 to conclude that a — b must be zero. Another way is to combine Corollary 2.3.7 with Lemma 
4.1.5.) 
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Exercise 4.1.7 Prove Lemma 4.1.11. (Hint: use the first part of this lemma to prove all the others.) 


Exercise 4.1.8 Show that the principle of induction (Axiom 2.5) does not apply directly to the 
integers. More precisely, give an example of a property P(n) pertaining to an integer n such that 
P(O) is true, and that P(n) implies P(n++) for all integers n, but that P(n) is not true for all 
integers n. Thus induction is not as useful a tool for dealing with the integers as it is with the natural 
numbers. (The situation becomes even worse with the rational and real numbers, which we shall 
define shortly.) 


Exercise 4.1.9 Show that the square of an integer is always a natural number. That is to say, prove 
that n? > 0 for every integer n. 


4.2 The Rationals 


We have now constructed the integers, with the operations of addition, subtraction, 
multiplication, and order and verified all the expected algebraic and order-theoretic 
properties. Now we will use a similar construction to build the rationals, adding 
division to our mix of operations. 

Just like the integers were constructed by subtracting two natural numbers, the 
rationals can be constructed by dividing two integers, though of course we have to 
make the usual caveat that the denominator should be non-zero.” Of course, just as 
two differences a — b and c — d can be equal ifa + d = c + b, we know (from more 
advanced knowledge) that two quotients a/b and c/d can be equal if ad = bc. Thus, 
in analogy with the integers, we create a new meaningless symbol // (which will 
eventually be superceded by division), and define 


Definition 4.2.1 A rational number is an expression of the form a//b, where a and 
b are integers and b is non-zero; a//0 is not considered to be a rational number. Two 
rational numbers are considered to be equal, a//b = c//d, if and only if ad = cb. 
The set of all rational numbers is denoted Q. 


Thus for instance 3//4 = 6//8 = —3// — 4, but3//4 4 4//3. This is a valid def- 
inition of equality (Exercise 4.2.1). Now we need a notion of addition, multiplication, 
and negation. Again, we will take advantage of our pre-existing knowledge, which 
tells us that a/b + c/d should equal (ad + bc)/(bd) and that a/b * c/d should equal 
ac/bd, while —(a/b) equals (—a)/b. Motivated by this foreknowledge, we define 


Definition 4.2.2 If a//b and c//d are rational numbers, we define their sum 


(a//b) + (c//d) := (ad + bc)//(bd) 


? There is no reasonable way we can divide by zero, since one cannot have both the identities 
(a/b) *b =a and c *0 = 0 hold simultaneously if b is allowed to be zero and a is non-zero. 
Similarly, the identities a/a = 1 and 2 * (a/a) = (2 * a)/a cannot simultaneously hold if 0/0 is 
defined. However, we can eventually get a reasonable notion of dividing by a quantity which 
approaches zero-think of L’H6pital’s rule (see Sect. 10.5), which suffices for doing things like 
defining differentiation. 
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their product 


(a//b) * (c//d) == (ac)//(bd) 


and the negation 


—(a//b) = (—a)//b. 


Note that if b and d are non-zero, then bd is also non-zero, by Proposition 4.1.8, 
so the sum or product of two rational numbers remains a rational number. 


Lemma 4.2.3 The sum, product, and negation operations on rational numbers are 
well-defined, in the sense that if one replaces a//b with another rational number 
a'//b' which is equal to a//b, then the output of the above operations remains 
unchanged, and similarly for c//d. 


Proof We just verify this for addition; we leave the remaining claims to Exercise 
4.2.2. Suppose a//b = a'//b’, so that b and b’ are non-zero and ab’ = a'b. We 
now show that a//b + c//d =a'//b' +.c//d. By definition, the left-hand side is 
(ad + bc)//bd and the right-hand side is (a’d + b'c)//b'd, so we have to show that 


(ad + be)b'd = (a'd + b'c)bd, 


which expands to 
abd” + bb'cd = a'bd* + bb'cd. 


But since ab’ = a’b, the claim follows. Similarly if one replaces c//d by c'//d’'. 


We note that the rational numbers a//1 behave in a manner identical to the integers 


(a//1) + (6//I) = a + b)//1; 
(a//1) x (6//D) = (ab//); 
—(a//1) = (—a)//1. 


Also, a//1 and b//1 are only equal when a and b are equal. Because of this, we will 
identify a with a//1 for each integer a: a = a//1; the above identities then guarantee 
that the arithmetic of the integers is consistent with the arithmetic of the rationals. 
Thus just as we embedded the natural numbers inside the integers, we embed the 
integers inside the rational numbers. In particular, all natural numbers are rational 
numbers, for instance 0 is equal to 0//1 and 1 is equal to 1//1. 

Observe that a rational number a//b is equal to 0 = 0//1 if and only ifa x 1 = 
b x 0, i.e., if the numerator a is equal to 0. Thus if a and b are non-zero then so is 
a//b. 

We now define a new operation on the rationals: reciprocal. If x = a//b is a 
non-zero rational (so that a, b 4 0) then we define the reciprocal x~' of x to be 
the rational number x~! := b//a. It is easy to check that this operation is consistent 
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with our notion of equality: if two rational numbers a//b, a’//b' are equal, then 
their reciprocals are also equal. (In contrast, an operation such as “numerator” is not 
well-defined: the rationals 3//4 and 6//8 are equal, but have unequal numerators, 
so we have to be careful when referring to such terms as “the numerator of x”.) We 
however leave the reciprocal of 0 undefined. 

We now summarize the algebraic properties of the rationals. 


Proposition 4.2.4 (Laws of algebra for rationals) Let x, y, z be rationals. Then the 
following laws of algebra hold: 


X+y=yt+x 
(«ty4+z=x4+0+4+2) 
x+0=0+x=x 
x+(—x) =(-x)+x=0 
xy = yx 
(xy)z = x(yz) 
xl=lx=x 
X(y+z)Hxy+xz 
(y + z)x = yx + 2x. 


If x is non-zero, we also have 


Remark 4.2.5 The above set of ten identities have a name; they are asserting that the 
rationals Q form a field. This is better than being a commutative ring because of the 
tenth identity xx—! = x~!x = 1. Note that this proposition supercedes Proposition 
4.1.6. 


Proof To prove this identity, one writes x = a//b, y = c//d, z= e//f for some 
integers a,c, e and non-zero integers b, d, f, and verifies each identity in turn using 
the algebra of the integers. We shall just prove the longest one, namely (x + y) +z = 
x + (y +z): 


(x + y) +z = ((a//b) + (c//d)) + (e//F) 
= ((ad + be)//bd) + (e//f) 
= (adf + bef + bde)//bdf; 

x + (y +2) = (a//b) + ((c//d) + //f) 
= (a//b) + (cf + de)//df) 
= (adf + bef + bde)//bdf 


and so one can see that (x + y) + z and x + (y 4+ z) are equal. The other identities 
are proven in a similar fashion and are left to Exercise 4.2.3. 
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We can now define the quotient x /y of two rational numbers x and y, provided 
that y is non-zero, by the formula 


x/fyi=xxyl. 


Thus, for instance 
(3//4)/(5//6) = (3//4) x (6//5) = (18//20) = (9//10). 


Using this formula, it is easy to see that a/b = a//b for every integer a and every 
non-zero integer b. Thus we can now discard the // notation, and use the more 
customary a/b instead of a//b. 

In a similar spirit, we define subtraction on the rationals by the formula 


x-—y:=x+(-y), 


just as we did with the integers. 

Proposition 4.2.4 allows us to use all the normal rules of algebra; we will now 
proceed to do so without further comment. 

In the previous section we organized the integers into positive, zero, and negative 
numbers. We now do the same for the rationals. 


Definition 4.2.6 A rational number x is said to be positive iff we have x = a/b for 
some positive integers a and b. It is said to be negative iff we have x = —y for some 
positive rational y (i.e., x = (—a)/b for some positive integers a and b). 


Thus for instance, every positive integer is a positive rational number, and every 
negative integer is a negative rational number, so our new definition is consistent 
with our old one. 


Lemma 4.2.7 (Trichotomy of rationals) Let x be a rational number. Then exactly 
one of the following three statements is true: (a) x is equal to 0. (b) x is a positive 
rational number. (c) x is a negative rational number. 


Proof See Exercise 4.2.4. 


Definition 4.2.8 (Ordering of the rationals) Let x and y be rational numbers. We say 
that x > y iff x — y is a positive rational number, and x < y iff x — y is a negative 
rational number. We write x > y iff either x > y or x = y, and similarly define 
x<y. 


Proposition 4.2.9 (Basic properties of order on the rationals) Let x, y, z be rational 
numbers. Then the following properties hold. 


(a) (Order trichotomy) Exactly one of the three statements x = y,x < y,orx >y 
is true. 
(b) (Order is antisymmetric) One has x < y if and only if y > x. 
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(c) (Order is transitive) If x < y and y < z, thenx < z. 
(d) (Addition preserves order) If x < y, theonx +z<y+z. 
(e) (Positive multiplication preserves order) Ifx < y and z is positive, thenxz < yz. 


Proof See Exercise 4.2.5. 


Remark 4.2.10 The above five properties in Proposition 4.2.9, combined with the 
field axioms in Proposition 4.2.4, have a name: they assert that the rationals Q form 
an ordered field. It is important to keep in mind that Proposition 4.2.9(e) only works 
when z is positive, see Exercise 4.2.6. 


— Exercises — 


Exercise 4.2.1 Show that the definition of equality for the rational numbers is reflexive, symmetric, 
and transitive. (Hint: for transitivity, use Corollary 4.1.9.) 


Exercise 4.2.2. Prove the remaining components of Lemma 4.2.3. 


Exercise 4.2.3 Prove the remaining components of Proposition 4.2.4. (Hint: as with Proposition 
4.1.6, you can save some work by using some identities to prove others.) 


Exercise 4.2.4 Prove Lemma 4.2.7. (Note that, as in Proposition 2.2.13, you have to prove two 
different things: firstly, that at least one of (a), (b), (c) is true; and secondly, that at most one of (a), 
(b), (c) is true.) 


Exercise 4.2.5 Prove Proposition 4.2.9. 


Exercise 4.2.6 Show that if x, y, z are rational numbers such that x < y and z is negative, then 
XZ > yz. 


4.3 Absolute Value and Exponentiation 


We have already introduced the four basic arithmetic operations of addition, sub- 
traction, multiplication, and division on the rationals. (Recall that subtraction and 
division came from the more primitive notions of negation and reciprocal by the 
formulae x — y := x + (—y) and x/y := x x y~!.) We also have a notion of order 
<, and have organized the rationals into the positive rationals, the negative rationals, 
and zero. In short, we have shown that the rationals Q form an ordered field. 

One can now use these basic operations to construct more operations. There are 
many such operations we can construct, but we shall just introduce two particularly 
useful ones: absolute value and exponentiation. 


Definition 4.3.1 (Absolute value) If x is a rational number, the absolute value |x| of 
x is defined as follows. If x is positive, then |x| := x. If x is negative, then |x| := —x. 
If x is zero, then |x| := 0. 


Definition 4.3.2 (Distance) Let x and y be rational numbers. The quantity |x — 
y| is called the distance between x and y and is sometimes denoted d(x, y), thus 
d(x, y) := |x — y|. For instance, d(3, 5) = 2. 
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Proposition 4.3.3 (Basic properties of absolute value and distance) Let x, y, z be 
rational numbers. 


(a) (Non-degeneracy of absolute value) We have |x| => 0. Also, |x| = 0 if and only 
if x is 0. 

(b) (Triangle inequality for absolute value) We have |x + y| < |x| +|y|- 

(c) We have the inequalities —y < x < y if and only if y > |x|. In particular, we 
have —|x| <x < |x|. 


(d) (Multiplicativity of absolute value) We have |xy| = |x||y|.In particular, | — x| = 
|x|. 

(e) (Non-degeneracy of distance) We have d(x, y) = 0. Also, d(x, y) = 0 if and 
only ifx = y. 


(f) (Symmetry of distance) d(x, y) = d(y, x). 
(g) (Triangle inequality for distance) d(x, z) < d(x, y)+ dQ, 2). 


Proof See Exercise 4.3.1. 


Absolute value is useful for measuring how “close” two numbers are. Let us make 
a somewhat artificial definition: 


Definition 4.3.4 (¢-closeness) Let ¢ > 0 be a rational number, and let x, y be ratio- 
nal numbers. We say that y is ¢-close to x iff we have d(y, x) < €. 


Remark 4.3.5 This definition is not standard in mathematics textbooks; we will use 
it as “scaffolding” to construct the more important notions of limits (and of Cauchy 
sequences) later on, and once we have those more advanced notions we will discard 
the notion of é-close. 


Examples 4.3.6 The numbers 0.99 and 1.01 are 0.1-close, but they are not 0.01 close, 
because d(0.99, 1.01) = |0.99 — 1.01| = 0.02 is larger than 0.01. The numbers 2 
and 2 are e-close for every positive ¢. 


We do not bother defining a notion of e-close when ¢ is zero or negative, because 
if € is zero then x and y are only ¢-close when they are equal, and when ¢ is negative 
then x and y are never ¢-close. (In any event it is a long-standing tradition in analysis 
that the Greek letters ¢, 5 should only denote small positive numbers.) 

Some basic properties of ¢-closeness are the following. 


Proposition 4.3.7 Let x, y, z, w be rational numbers. 


(a) Ifx = y, then x is €-close to y for every € > 0. Conversely, if x is &-close to y 
for every € > 0, then we have x = y. 

(b) Lete > 0. If x is e-close to y, then y is €-close to x. 

(c) Let e,5 > 0. Ifx is e-close to y, and y is 5-close to z, then x and z are (€ + 3)- 
close. 

(d) Let ¢,5 > 0. If x and y are &-close, and z and w are 6-close, then x + z and 
y + ware (€¢ + 5)-close, and x — z and y — w are also (€ + 5)-close. 

(e) Let e > 0. Ifx and y are €-close, they are also £'-close for every &' > &. 
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(f) Let e > 0. If y and z are both e-close to x, and w is between y and z (i.e., 
ysws<zorz<w<_y), then w is also €-close to x. 

(g) Lete > 0. Ifx and y are €-close, and z is non-zero, then xz and yz are &|z|-close. 

(h) Let ¢,6 > 0. If x and y are €-close, and z and w are 5-close, then xz and yw 
are (€|z| + 6|x| + €6)-close. 


Proof We only prove the most difficult one, (h); we leave (a)—(g) to Exercise 4.3.2. 
Let ¢, 6 > 0, and suppose that x and y are e-close. If we write a := y — x, then we 
have y = x + a and that |a| < e. Similarly, if z and w are d-close, and we define 
b:=w — z, then w =z+b5and |b| < 6. 

Since y= x +a and w = z+), we have 

yw= (xX t+ayz+b)=xz+az+xb+ab. 

Thus 

|yw — xz| = |az + bx + ab| < |az| + |bx| + lab] = Jallz| + lb||x| + lal lal. 


Since |a| < ¢ and |b| < 6, we thus have 


lyw — xz| < elz| + 6|x| + 66 


and thus that yw and xz are (e|z| + 4|x| + €6)-close. 


Remark 4.3.8 One should compare statements (a)—(c) of this proposition with the 
reflexive, symmetric, and transitive axioms of equality. It is often useful to think of 
the notion of “e-close” as an approximate substitute for that of equality in analysis. 


Now we recursively define exponentiation for natural number exponents, extend- 
ing the previous definition in Definition 2.3.11. 


Definition 4.3.9 (Exponentiation to a natural number) Let x be a rational number. 
To raise x to the power 0, we define x° := 1; in particular we define 0° := 1. Now 
suppose inductively that x” has been defined for some natural number n, then we 
define x"*! := x" x x. 

Proposition 4.3.10 (Properties of exponentiation, I) Let x, y be rational numbers, 
and let n,m be natural numbers. 


(a) We have x"x™ = x"™, (x")" = x", and (xy)" = x"y". 

(b) Suppose n > 0. Then we have x" = 0 if and only if x = 0. 

(c) Ifx > y => 0, thenx" > y” > 0. Ifx > y > Oandn > 0, then x" > y" > 0. 
(d) We have |x"| = |x|". 


Proof See Exercise 4.3.3. 


Now we define exponentiation for negative integer exponents. 
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Definition 4.3.11 (Exponentiation to a negative number) Let x be a non-zero ratio- 
nal number. Then for any negative integer —n, we define x~” := 1/x”. 


Thus for instance x~? = 1/x? = 1/(x x x x x). Note that when n = 1, the defi- 
nition of x! provided by Definition 4.3.11 coincides with the reciprocal of x defined 
in Sect. 4.2, so there is no incompatibility of notation caused by this new definition. 

We now have x” defined for any integer n, whether n is positive, negative, or 
zero. Exponentiation with integer exponents has the following properties (which 
supercede Proposition 4.3.10): 


Proposition 4.3.12 (Properties of exponentiation, II) Let x, y be non-zero rational 
numbers, and let n,m be integers. 


(a) We have xx” _— xntm (x") = Seas and (xy)” _— xy", 

(b) Ifx > y > 0, thenx” > y" > Oifn is positive, and0 < x" < y" ifn is negative. 
(c) Ifx,y > 0,n £0, and x" = y", thenx = y. 

(d) We have |x"| = |x|". 


Proof See Exercise 4.3.4. 


— Exercises — 


Exercise 4.3.1 Prove Proposition 4.3.3. (Hint: while all of these claims can be proven by dividing 
into cases, such as when x is positive, negative, or zero, several parts of the proposition can be 
proven without such a tedious division into cases. For instance one can use earlier parts of the 
proposition to prove later ones.) 


Exercise 4.3.2. Prove the remaining claims in Proposition 4.3.7. 
Exercise 4.3.3 Prove Proposition 4.3.10. (Hint: use induction.) 


Exercise 4.3.4 Prove Proposition 4.3.12. (Hint: induction is not suitable here. Instead, use Propo- 
sition 4.3.10.) 


Exercise 4.3.5 Prove that 2" > N for all positive integers N. (Hint: use induction.) 


4.4 Gaps in the Rational Numbers 


Imagine that we arrange the rationals on a line, arranging x to the right of yifx > y. 
(This is a non-rigorous arrangement, since we have not yet defined the concept of a 
line, but this discussion is only intended to motivate the more rigorous propositions 
below.) Inside the rationals we have the integers, which are thus also arranged on the 
line. Now we work out how the rationals are arranged with respect to the integers. 


Proposition 4.4.1 (Interspersing of integers by rationals). Let x be a rational num- 
ber. Then there exists an integer n such thatn < x <n-+ 1. In fact, this integer is 
unique (i.e., for each x there is only one n for which n < x <n -+ 1). In particular, 
there exists a natural number N such that N > x (i.e., there is no such thing as a 
rational number which is larger than all the natural numbers). 
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Remark 4.4.2 The integer n for which n < x <n-+ 1 is sometimes referred to as 
the integer part of x and is sometimes denoted n = [x]. 


Proof See Exercise 4.4.1. 


Also, between every two rational numbers there is at least one additional rational: 


Proposition 4.4.3 (Interspersing of rationals by rationals). [fx and y are two ratio- 
nals such that x < y, then there exists a third rational z such that x < z < y. 


Proof We set z := (x + y)/2. Since x < y, and 1/2 = 1//2 is positive, we have 
from Proposition 4.2.9 that x/2 < y/2. If we add y/2 to both sides using Proposition 
4.2.9 we obtain x/2 + y/2 < y/2+ y/2, ie., z < y. If we instead add x/2 to both 
sides we obtain x/2 + x/2 < y/2+x/2,ie.,x < z. Thus x < z < yas desired. 


Despite the rationals having this denseness property, they are still incomplete; 
there are still an infinite number of “gaps” or “holes” between the rationals, although 
this denseness property does ensure that these holes are in some sense infinitely 
small. For instance, we will now show that the rational numbers do not contain any 
square root of two. 


Proposition 4.4.4 There does not exist any rational number x for which x? = 2. 


Proof We only give a sketch of a proof; the gaps will be filled in Exercise 4.4.3. 
Suppose for sake of contradiction that we had a rational number x for which x? = 2. 
Clearly x is not zero. We may assume that x is positive, for if x were negative then 
we could just replace x by —x (since x? = (—x)*). Thus x = p/gq for some positive 
integers p,q, 80 (p/q)? = 2, which we can rearrange as p* = 2q”. Define a natural 
number p to be even if p = 2k for some natural number k, and odd if p = 2k + 1 
for some natural number k. Every natural number is either even or odd, but not both 
(why?). If p is odd, then p is also odd (why?), which contradicts p? = 2q*. Thus 
p is even, i.e., p = 2k for some natural number k. Since p is positive, k must also 
be positive. Inserting p = 2k into p? = 2q we obtain 4k? = 2q7, so that g* = 2k?. 

To summarize, we started with a pair (p, q) of positive integers such that p? = 
2q7, and ended up with a pair (q, k) of positive integers such that g? = 2k. Since 
p> = 2q*, we have g < p (why?). If we rewrite p’ := g and q’ := k, we thus can 
pass from one solution (p,q) to the equation p* = 2q? to a new solution (p’, q’) 
to the same equation which has a smaller value of p. But then we can repeat this 
procedure again and again, obtaining a sequence (p”, gq”), (p”, q’”), etc., of solutions 
to p? = 2q*, each one with a smaller value of p than the previous, and each one 
consisting of positive integers. But this contradicts the principle of infinite descent 
(see Exercise 4.4.2). This contradiction shows that we could not have had a rational 
x for which x? = 2. 


On the other hand, we can get rational numbers which are arbitrarily close to a 
square root of 2: 
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Proposition 4.4.5 For every rational number ¢ > 0, there exists a non-negative 
rational number x such that x* <2 < (x +e)’. 


Proof Let ¢ > 0 be rational. Suppose for sake of contradiction that there is no non- 
negative rational number x for which x? <2 < (x +e). This means that whenever 
x is non-negative and x? < 2, we must also have (x + €)* < 2 (note that (x + e)* 
cannot equal 2, by Proposition 4.4.4). Since 0? < 2, we thus have ¢? < 2, which 
then implies (2e)? < 2, and indeed a simple induction shows that (ne)? <2 for 
every natural number n. (Note that ne is non-negative for every natural number n - 
why?) But, by Proposition 4.4.1 we can find an integer n such that n > 2/e, which 
implies that ne > 2, which implies that (ne)? > 4 > 2, contradicting the claim that 
(ne)* < 2 for all natural numbers n. This contradiction gives the proof. 


Example 4.4.6 If? ¢ = 0.001, we can take x = 1.414, since x* = 1.999396 and 
(x +e)? = 2.002225. 


Proposition 4.4.5 indicates that, while the set Q of rationals does not actually have 
J/2asa member, we can get as close as we wish to ./2. For instance, the sequence 
of rationals 

1.4, 1.41, 1.414, 1.4142, 1.41421,... 


seem to get closer and closer to 2, as their squares indicate: 
1.96, 1.9881, 1.99396, 1.99996164, 1.9999899241,... 


Thus it seems that we can create a square root of 2 by taking a “limit” of a sequence of 
rationals. This is how we shall construct the real numbers in the next chapter. (There 
is another way to do so, using something called “Dedekind cuts”, which we will not 
pursue here. One can also proceed using infinite decimal expansions, but there are 
some sticky issues when doing so, e.g., one has to make 0.999... equal to 1.000..., 
and this approach, despite being the most familiar, is actually more complicated than 
other approaches; see Appendix B.) 


— Exercises — 
Exercise 4.4.1 Prove Proposition 4.4.1. (Hint: use Proposition 2.3.9.) 


Exercise 4.4.2 A definition: a sequence ag, a), a2, . .. of numbers (natural numbers, integers, ratio- 

nals, or reals) is said to be in infinite descent if we have ay, > day+, for all natural numbers n (i.e., 

aj >a, >a.>...). 

(a) Prove the principle of infinite descent: that it is not possible to have a sequence of natural 
numbers which is in infinite descent. (Hint: assume for sake of contradiction that you can find 
a sequence of natural numbers which is in infinite descent. Since all the a, are natural numbers, 
you know that a, > 0 for all n. Now use induction to show in fact that a, > k for allk e N 
and all n € N, and obtain a contradiction.) 


3 We will use the decimal system for defining terminating decimals, for instance 1.414 is defined 
to equal the rational number 1414/1000. For a formal discussion on the decimal system, see 
Appendix B. 
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(b) Does the principle of infinite descent work if the sequence a}, a2, a3, ... 1s allowed to take 
integer values instead of natural number values? What about if it is allowed to take positive 
rational values instead of natural numbers? Explain. 


Exercise 4.4.3 Fill in the gaps marked (why?) in the proof of Proposition 4.4.4. Is the axiom of 
choice required to establish this proposition? 


Chapter 5 ®) 
The Real Numbers cits 


To review our progress to date, we have rigorously constructed three fundamental 
number systems: the natural number system N, the integers Z, and the rationals Q.! 
We defined the natural numbers using the five Peano axioms and postulated that such 
a number system existed; this is plausible, since the natural numbers correspond to 
the very intuitive and fundamental notion of sequential counting. Using that number 
system one could then recursively define addition and multiplication, and verify that 
they obeyed the usual laws of algebra. We then constructed the integers by taking 
formal’ differences of the natural numbers, a — b. We then constructed the rationals 
by taking formal quotients of the integers, a//b, although we need to exclude division 
by zero in order to keep the laws of algebra reasonable. (You are of course free to 
design your own number system, possibly including one where division by zero 
is permitted; but you will have to give up one or more of the field axioms from 
Proposition 4.2.4, among other things, and you will probably get a less useful number 
system in which to do any real-world problems.) 

The rational system is already sufficient to do a lot of mathematics—much of high 
school algebra, for instance, works just fine if one only knows about the rationals. 
However, there is a fundamental area of mathematics where the rational number 
system does not suffice—that of geometry (the study of lengths, areas, etc.). For 
instance, a right-angled triangle with both sides equal to | gives a hypotenuse of 
»/2, which is an irrational number, i.e., not a rational number; see Proposition 4.4.4. 


' The symbols N, Q, and R stand for “natural”, “quotient”, and “real” respectively. Z stands for 
“Zahlen’”, the German word for “numbers”. There is also the complex numbers C, which obviously 
stands for “complex”, which you will see in Sect. 4.6 of Analysis II. 

? Formal means “having the form of”; at the beginning of our construction the expression a —b 
did not actually mean the difference a — b, since the symbol — was meaningless. It only had the 
form of a difference. Later on we defined subtraction and verified that the formal difference was 
equal to the actual difference, so this eventually became a non-issue, and our symbol for formal 
differencing was discarded. Somewhat confusingly, this use of the term “formal” is unrelated to the 
notions of a formal argument and an informal argument. 
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Things get even worse when one starts to deal with the subfield of geometry known 
as trigonometry, when one sees numbers such as z or cos(1), which turn out to 
be in some sense “even more” irrational than /2. (These numbers are known as 
transcendental numbers, but to discuss this further would be far beyond the scope 
of this text.) Thus, in order to have a number system which can adequately describe 
geometry—or even something as simple as measuring lengths on a line—one needs 
to replace the rational number system with the real number system. Since differential 
and integral calculus is also intimately tied up with geometry—think of slopes of 
tangents, or areas under a curve—calculus also requires the real number system in 
order to function properly. 

However, a rigorous way to construct the reals from the rationals turns out to 
be somewhat difficult—requiring a bit more machinery than what was needed to 
pass from the naturals to the integers, or the integers to the rationals. In those two 
constructions, the task was to introduce one more algebraic operation to the number 
system—e.g., one can get integers from naturals by introducing subtraction, and get 
the rationals from the integers by introducing division. But to get the reals from the 
rationals is to pass from a “discrete” system to a “continuous” one and requires the 
introduction of a somewhat different notion—that of a limit. The limit is a concept 
which on one level is quite intuitive, but to pin down rigorously turns out to be quite 
challenging. (Even such great mathematicians as Euler and Newton had difficulty 
with this concept. It was only in the nineteenth century that mathematicians such as 
Cauchy and Dedekind figured out how to deal with limits rigorously.) 

In Sect. 4.4 we explored the “gaps” in the rational numbers; now we shall fill in 
these gaps using limits to create the real numbers. The real number system will end 
up being a lot like the rational numbers but will have some new operations—notably 
that of supremum, which can then be used to define limits and thence to everything 
else that calculus needs. 

The procedure we give here of obtaining the real numbers as the limit of sequences 
of rational numbers may seem rather complicated. However, it is in fact an instance 
of a very general and useful procedure, that of completing one metric space to form 
another; see Exercise 1.4.8 of Analysis IT. 


5.1 Cauchy Sequences 


Our construction of the real numbers shall rely on the concept of a Cauchy sequence. 
Before we define this notion formally, let us first define the concept of a sequence. 


Definition 5.1.1 (Sequences). Let m be an integer. A sequence (a,)°~,, of rational 


numbers is any function from the set {n € Z:n > m} to Q, i.e., a mapping which 
assigns to each integer n greater than or equal to m, a rational number a,. More 
informally, a sequence (a,,)°°_,, of rational numbers is a collection of rationals a,,, 


Am+1, Am+25+++ 
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Example 5.1.2 The sequence (Ce see is the collection 0, 1, 4,9, ... of natural num- 
bers; the sequence (3)°°, is the collection 3, 3, 3,... of natural numbers. These 
sequences are indexed starting from 0, but we can of course make sequences starting 
from 1 or any other number; for instance, the sequence (a, )°°. , denotes the sequence 


a3, 44, A5,..., SO oe) 3 1s the collection 9, 16, 25, ... of natural numbers. 


We want to define the real numbers as the limits of sequences of rational numbers. 
To do so, we have to distinguish which sequences of rationals are convergent and 
which ones are not. For instance, the sequence 


1.4, 1.41, 1.414, 1.4142, 1.41421,... 
looks like it is trying to converge to something, as does 
0.1, 0.01, 0.001, 0.0001, ... 
while other sequences such as 
1,2, 4, 8, 16,... 


or 
1,0,1,0,1,... 


do not. To do this we use the definition of e-closeness defined earlier. Recall from 
Definition 4.3.4 that two rational numbers x, y are e-close if d(x, y) = |x — y| <6. 


Definition 5.1.3 (e-steadiness). Lete > 0. A sequence (a,)°°., is said to be e-steady 
iff each pair a;, a, of sequence elements is ¢-close for every natural number j, k. In 
other words, the sequence do, 4, a2, ... is €-steady iff |a; — ax| < € forall j,k. 


Remark 5.1.4 This definition is not standard in the literature; we will not need it 
outside of this section; similarly for the concept of “eventual ¢-steadiness” below. 
We have defined e-steadiness for sequences whose index starts at 0, but clearly we 
can make a similar notion for sequences whose indices start from any other number: 
a sequence dy, dy41,... is €-steady if one has |a; — ax| < € forall j,k > N. 


Example 5.1.5 The sequence 1, 0, 1,0, 1, ...is 1-steady but is not 1/2-steady. The 
sequence 0.1, 0.01, 0.001, 0.0001, ... is 0.1-steady, but is not 0.01-steady (why?). 
The sequence 1, 2, 4, 8, 16,... is not e-steady for any ¢ (why?). The sequence 
2,2,2,2,...is €-steady for every ¢ > 0. 


The notion of ¢-steadiness of a sequence is simple, but does not really capture the 
limiting behavior of a sequence, because it is too sensitive to the initial members of 
the sequence. For instance, the sequence 


10, 0,0, 0,0,0,... 
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is 10-steady, but is not ¢-steady for any smaller value of ¢, despite the sequence 
converging almost immediately to zero. So we need a more robust notion of steadiness 
that does not care about the initial members of a sequence. 


Definition 5.1.6 (Eventual e-steadiness). Lete > 0. Asequence (a,,)°° 9 is said to be 
eventually e-steady iff the sequence ay, dn+1, n+2,...1S €-Steady for some natural 
number N > 0. In other words, the sequence ao, aj, a2, ... is eventually e-steady iff 
there exists an N > 0 such that |a; — ax| < € forall j,k > N. 


Example 5.1.7 The sequence aj, a2,... defined by a, := 1/n, (i.e., the sequence 
1, 1/2, 1/3, 1/4, ...) is not 0.1-steady, but is eventually 0.1-steady, because the 
sequence aj9, 411, @j2,... (ie., 1/10, 1/11, 1/12, ...) is 0.1-steady. The sequence 
10, 0, 0, 0, 0, ... is not e-steady for any ¢ less than 10, but it is eventually ¢-steady 
for every ¢ > 0 (why?). 


Now we can finally define the correct notion of what it means for a sequence of 
rationals to “want” to converge. 


Definition 5.1.8 (Cauchy sequences). A sequence (a,)°°. of rational numbers is 
said to be a Cauchy sequence iff for every rational ¢ > 0, the sequence (a,)°°.4 is 
eventually ¢-steady. In other words, the sequence do, a1, a2, ... 1S a Cauchy sequence 
iff for every ¢ > 0, there exists an N > 0 such that d(a;, ay) < € forall j,k > N. 


Remark 5.1.9 At present, the parameter ¢ is restricted to be a positive rational; we 
cannot take ¢ to be an arbitrary positive real number, because the real numbers have 
not yet been constructed. However, once we do construct the real numbers, we shall 
see that the above definition will not change if we require ¢ to be real instead of 
rational. In other words, we will eventually prove that a sequence is eventually e- 
steady for every rational ¢ > 0 if and only if it is eventually ¢-steady for every real 
€ > 0; see Proposition 6.1.4. This rather subtle distinction between a rational ¢ and 
a real ¢ turns out not to be very important in the long run, and the reader is advised 
not to pay too much attention as to what type of number e should be. 


Example 5.1.10 (Informal) Consider the sequence 
1.4, 1.41, 1.414, 1.4142,... 


mentioned earlier. This sequence is already 0.1-steady. If one discards the first ele- 
ment 1.4, then the remaining sequence 


1.41, 1.414, 1.4142, ... 


is now 0.01-steady, which means that the original sequence was eventually 0.01- 
steady. Discarding the next element gives the 0.001-steady sequence 1.414, 1.4142, 
...} thus the original sequence was eventually 0.001-steady. Continuing in this way 
it seems plausible that this sequence is in fact e-steady for every e > 0, which seems 
to suggest that this is a Cauchy sequence. However, this discussion is not rigorous 
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for several reasons, for instance we have not precisely defined what this sequence 
1.4, 1.41, 1.414, ... really is. An example of a rigorous treatment follows next. 


Proposition 5.1.11 The sequence aj, a2, a3,... defined by a,:=1/n (i.e., the 
sequence 1, 1/2,1/3,...) is a Cauchy sequence. 


Proof We have to show that for every ¢ > 0, the sequence aj, a2, ... is eventually 
e-steady. So let ¢ > 0 be arbitrary. We now have to find a number N > 1 such that 
the sequence ay, dy+1,... 1S €-steady. Let us see what this means. This means that 
d(aj, ay) < € for every j,k > N, Le. 


|1/j — 1/k| < € forevery j,k > N. 


Now since j, k > N, weknow that0 < 1/j, 1/k < 1/N,sothat|1/j — 1/k| < 1/N. 
So in order to force |1/j — 1/k| to be less than or equal to ¢, it would be sufficient 
for 1/N to be less than e. So all we need to do is choose an N such that 1/N is less 
than ¢, or in other words that N is greater than |/¢. But this can be done thanks to 
Proposition 4.4.1. 


As you can see, verifying from first principles (i.e., without using any of the 
machinery of limits, etc.) that a sequence is a Cauchy sequence requires some effort, 
even for a sequence as simple as 1 /n. The part about selecting an N can be particularly 
difficult for beginners—one has to think in reverse, working out what conditions on 
N would suffice to force the sequence ay, dy+1, 4n42, ... to be e-steady, and then 
finding an N which obeys those conditions. Later we will develop some limit laws 
which allow us to determine when a sequence is Cauchy more easily. 

We now relate the notion of a Cauchy sequence to another basic notion, that of a 
bounded sequence. 


Definition 5.1.12 (Bounded sequences). Let M > 0 be rational. A finite sequence 
a1, 42, ..-, An is bounded by M iff |a;| < M for all 1 <i <n. An infinite sequence 
(an )°°, is bounded by M iff |a;| < M for alli > 1. A sequence is said to be bounded 
iff it is bounded by M for some rational M > 0. 


Example 5.1.13 The finite sequence 1, —2, 3, —4 is bounded (in this case, it is 
bounded by 4, or indeed by any M greater than or equal to 4). But the infinite sequence 
1, —2, 3, —4,5, —6, ... is unbounded. (Can you prove this? Use Proposition 4.4.1.) 
The sequence 1, —1, 1, —1, ... is bounded (e.g., by 1), but is not a Cauchy sequence. 


Lemma 5.1.14 (Finite sequences are bounded). Every finite sequence a), dz, ..., An 
is bounded. 


Proof We prove this by induction on n. When n = | the sequence a is clearly 
bounded, for if we choose M := |a;| then clearly we have |a;| < M foralll <i <n. 
Now suppose that we have already proved the lemma for some n > 1; we now 
prove it for n + 1, i.e., we prove every sequence aj, d2,..., 441 is bounded. By 
the induction hypothesis we know that a), a2,...,@, is bounded by some M > 0; 
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in particular, it must be bounded by M + |a,+;|. On the other hand, a,4; is also 
bounded by M + |ay,41|. Thus a), a2, ..., Gn, Gn44 is bounded by M + |ay41|, and 
is hence bounded. This closes the induction. 


Note that while this argument shows that every finite sequence is bounded, no 
matter how long the finite sequence is, it does not say anything about whether an 
infinite sequence is bounded or not; infinity is not a natural number. However, we 
have 


(oe) 


Lemma 5.1.15 (Cauchy sequences are bounded). Every Cauchy sequence (an)p° 


is bounded. 


Proof See Exercise 5.1.1. 


— Exercises — 


Exercise 5.1.1 Prove Lemma 5.1.15. (Hint: use the fact that a, is eventually 1-steady, and thus can 
be split into a finite sequence and a 1-steady sequence. Then use Lemma 5.1.14 for the finite part. 
Note there is nothing special about the number | used here; any other positive number would have 
sufficed.) 


Exercise 5.1.2 Tf (an)? and (bn)p°, are bounded sequences, show that (dy + bn)P° 1, (dn — 
bn )Po_,, and (anbn)°°, are also bounded. 


5.2 Equivalent Cauchy Sequences 


Consider the two Cauchy sequences of rational numbers: 
1.4, 1.41, 1.414, 1.4142, 1.41421,... 


and 
1.5, 1.42, 1.415, 1.4143, 1.41422,... 


Informally, both of these sequences seem to be converging to the same number, the 
square root /2 = 1.41421... (though this statement is not yet rigorous because 
we have not defined real numbers yet). If we are to define the real numbers from 
the rationals as limits of Cauchy sequences, we have to know when two Cauchy 
sequences of rationals give the same limit, without first defining a real number (since 
that would be circular). To do this we use a similar set of definitions to those used to 
define a Cauchy sequence in the first place. 


Definition 5.2.1 (e-close sequences). Let (a,)?2.9 and (b,)?°2 be two sequences, 
and let e > 0. We say that the sequence (a,)7°.9 is €-close to (bn) 7-9 iff ay is e-close 
to b, for each n € N. In other words, the sequence do, a, ad, ... 18 €-close to the 
sequence bo, bj, bo, ... iff |a, — b,| < ¢ for allm = 0, 1, 2,.... 
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Example 5.2.2. The two sequences 
1,-1,1,-1,1,... 


and 
1.1, —1.1, 1.1, -1.1, 1.1,... 


are 0.1-close to each other. (Note however that neither of them are 0.1-steady). 


Definition 5.2.3 (Eventually e-close sequences). Let (ay) and (b,)°2.9 be two 
sequences, and let ¢ > 0. We say that the sequence (a,,)°° 9 is eventually e-close to 
(bn )e-9 iff there exists an N > O such that the sequences (a,)°° y and (by, )°° y are 
e-close. In other words, do, a1, a2, ... is eventually ¢-close to bo, b1, b2, ... iff there 
exists an N > O such that |a, — b,| < ¢ foralln > N. 


Remark 5.2.4 Again, the notions of Oe-close sequences and eventually e-close 
sequences are not standard in the literature, and we will not use them outside of 
this section. 


Example 5.2.5. The two sequences 
1.1, 1.01, 1.001, 1.0001,... 


and 
0.9, 0.99, 0.999, 0.9999, ... 


are not 0. 1-close (because the first elements of both sequences are not 0.1-close to 
each other). However, the sequences are still eventually 0.1-close, because if we start 
from the second elements onwards in the sequence, these sequences are 0. 1-close. A 
similar argument shows that the two sequences are eventually 0.01-close (by starting 
from the third element onwards), and so forth. 


Definition 5.2.6 (Equivalent sequences). Two sequences (a,)~°.9 and (b,)°°9 are 


equivalent iff for each rational e > 0, the sequences (a,)7°.9 and (bn)p2.9 are even- 
tually e-close. In other words, do, a1, dz, ... and bo, bj, bo, .. . are equivalent iff for 
every rational ¢ > 0, there exists an N > O such that |a, — b,| < ¢ foralln > N. 


Remark 5.2.7 As with Definition 5.1.8, the quantity « > 0 is currently restricted to 
be a positive rational, rather than a positive real. However, we shall eventually see 
that it makes no difference whether ¢ ranges over the positive rationals or positive 
reals; see Exercise 6.1.10. 


From Definition 5.2.6 it seems that the two sequences given in Example 5.2.5 
appear to be equivalent. We now prove this rigorously. 


Proposition 5.2.8 Let (a,)°°, and (b,)°°., be the sequences dy =1-+10~" and 


n=1 
b, = 1-107”. Then the sequences ay, by, are equivalent. 
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Remark 5.2.9 This proposition, in decimal notation, asserts that 1.0000... = 0.9999 
...5 see Proposition B.2.3. 


Proof We need to prove that for every ¢ > 0, the two sequences (a,)°° , and (bn) °°, 
are eventually e-close to each other. So we fix an ¢ > 0. We need to find an N > 0 
such that (a; )°° y and (b,)°° ,, are €-close; in other words, we need to find an N > 0 
such that 


ld, — by| < ¢ foralln > N. 
However, we have 
lan — bn| = | +107") - —10")| = 2 x 10”. 


Since 10~” is a decreasing function of n (i.e., 107” < 10~” whenever m > n; this 
is easily proven by induction), andn > N, we have 2 x 10°" <2 x 10-”. Thus we 
have 

lan — b,| < 2x 10-% for alln > N. 


Thus in order to obtain |a, — b,| < € for all n > N, it will be sufficient to choose 
N so that 2 x 10-% < «. This is easy to do using logarithms, but we have not yet 
developed logarithms yet, so we will use a cruder method. First, we observe 10% 
is always greater than N for any N > 1 (see Exercise 4.3.5). Thus 10-% < 1/N, 
and so 2 x 10-% < 2/N. Thus to get 2 x 10-% < ge, it will suffice to choose N so 
that 2/N < e, or equivalently that N > 2/¢. But by Proposition 4.4.1 we can always 
choose such an N, and the claim follows. 


— Exercises — 


io,<) 


Exercise 5.2.1 Show thatif (ay )P°_, and (bn)e° , are equivalent sequences of rationals, then (ay, )°° 


is a Cauchy sequence if and only if (b, )°° , is a Cauchy sequence. 


Exercise 5.2.2. Let ¢ > 0. Show that if (ay)°° , and (by, )°° , are eventually ¢-close, then (a,)°° 


is bounded if and only if (bn)?°_, is bounded. 


5.3. The Construction of the Real Numbers 


We are now ready to construct the real numbers. We shall introduce a new formal 
symbol LIM, similar to the formal notations — and // defined earlier; as the notation 
suggests, this will eventually match the familiar operation of lim, at which point the 
formal limit symbol can be discarded. 


Definition 5.3.1 (Real numbers). A real number is defined to be an object of the 


form LIMy-, oo dn, Where (a,,)°°_, is a Cauchy sequence of rational numbers. Two real 


numbers LIMy-30 dy and LIM. by are said to be equal iff (a,)°, and (b,)°° 


n=1 n=1 
are equivalent Cauchy sequences. The set of all real numbers is denoted R. 


5.3. The Construction of the Real Numbers 89 
Example 5.3.2 (Informal) Let a), az, a3, ... denote the sequence 

1.4, 1.41, 1.414, 1.4142, 1.41421,... 
and let b;, bo, b3, ... denote the sequence 

1.5, 1.42, 1.415, 1.4143, 1.41422, ... 


then LIM,-.50 G, is a real number, and is the same real number as LIM,-..5 Dn, 
because (a,)°°, and (b,)°2, are equivalent Cauchy sequences: LIMy-+0 dn = 
LIMn 00 Dn. 


We will refer to LIM;—00 Gn as the formal limit of the sequence (a,)°° ,. Later on 
we will define a genuine notion of limit, and show that the formal limit of a Cauchy 
sequence is the same as the limit of that sequence; after that, we will not need formal 
limits ever again. (The situation is much like what we did with formal subtraction 
— and formal division //.) 

In order to ensure that this definition is valid, we need to check that the notion of 
equality in the definition obeys the first three axioms of equality: 


Proposition 5.3.3 (Formal limits are well-defined). Let x = LIMn+04n, y = 
LIMy 00 Dn, and z = LIMn-+00 Cn be real numbers. Then, with the above defini- 
tion of equality for real numbers, we have x = x. Also, ifx = y, then y = x. Finally, 
ifx = yand y =z, then x = z. 


Proof See Exercise 5.3.1. 


Because of this proposition, we know that our definition of equality between two 
real numbers is legitimate. Of course, when we define other operations on the reals, 
we have to check that they obey the axiom of substitution: two real number inputs 
which are equal should give equal outputs when applied to any operation on the real 
numbers. 

Now we want to define on the real numbers all the usual arithmetic operations, 
such as addition and multiplication. We begin with addition. 


Definition 5.3.4 (Addition of reals). Let x = LIMy-+0o dyn and y = LIMy-..0 by, be 
real numbers. Then we define the sum x + y to be x + y := LIMy .40(q + Dy). 


Example 5.3.5 ThesumofLIM,-.o, 1 + 1/nandLIM,-,.5 2 + 3/nis LIMy-.o9 3 + 
4/n. 


We now check that this definition is valid. The first thing we need to do is to 
confirm that the sum of two real numbers is in fact a real number: 


Lemma 5.3.6 (Sum of Cauchy sequences is Cauchy). Let x = LIM). dy and y = 
LIMn—00 Dn be real numbers. Then x + y is also a real number (i.e., (dn + bn)Po) 
is a Cauchy sequence of rationals). 
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Proof We need to show that for every ¢ > 0, the sequence (a, + b,)°° , is eventually 
e-steady. Now from hypothesis we know that (a,)°°, is eventually e-steady, and 
(b,)°<, is eventually e-steady, but it turns out that this is not quite enough (this can 
be used to imply that (a, + b,)°°, is eventually 2e-steady, but that’s not what we 
want). So we need to do a little trick, which is to play with the value of «. 

We know that (a,)°° ; is eventually 6-steady for every value of 5. This implies not 
only that (a,,)°° , is eventually e-steady, but it is also eventually ¢/2-steady. Similarly, 
the sequence (b,,)°°., is also eventually ¢/2-steady. This will turn out to be enough 
to conclude that (a, + b,)°, is eventually e-steady. 

Since (a,)7°., is eventually ¢/2-steady, we know that there exists an N > | such 
that (a, )P° y 1s ¢/2-steady, 1.€., ad, and a», are € /2-close for every n,m > N. Similarly 
there exists an M > | such that (b,)°° y, is ¢/2-steady, i-e., b, and b,, are ¢/2-close 
for everyn,m>M. 

Let max(N, M) be the larger of N and M (we know from Proposition 2.2.13 that 
one has to be greater than or equal to the other). If, m > max(N, M), then we know 
that a, and a, are ¢/2-close, and b, and b,, are ¢/2-close, and so by Proposition 
4.3.7 we see that a, + b, and a,, + bm are e-close for every n,m > max(N, M). 
This implies that the sequence (a, + b,)°°, is eventually e-steady, as desired. 


The other thing we need to check is the axiom of substitution (see Sect. A.7): if 
we replace a real number x by another number equal to x, this should not change the 
sum x + y (and similarly if we substitute y by another number equal to y). 


Lemma 5.3.7 (Sums of equivalent Cauchy sequences are equivalent). Let x = 
LIMy +00 Gn» Y = LIM +00 Dn, and x' = LIMy-.00 a’, be real numbers. Suppose that 
x =x’. Thenwehavex+y=x'+y. 


Proof Since x and x’ are equal, we know that the Cauchy sequences (a,)°°, and 
(aj,)n-., are equivalent, so in other words they are eventually ¢-close for each ¢ > 0. 
We need to show that the sequences (a, + b,)°2, and (a), + b,)°°, are eventually 
é-close for each ¢ > 0. But we already know that there isan N > 1 such that (a,)°° y, 
and (a/,)°° ,, are e-close, i.e., that a, and a}, are e-close for each n > N. Since by, 
is of course 0-close to b, (where we extend the notion of ¢-closeness to the « = 0 
case in the obvious fashion), we thus see from Proposition 4.3.7 (extended to cover 
the 0-close case) that a, + b, and a’, + b, are e-close for each n > N. This implies 
that (a, + b,)°2, and (a), + b,)P°., are eventually e-close for each ¢ > 0, and we are 
done. 
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Remark 5.3.8 The above lemma verifies the axiom of substitution for the “x” vari- 
able in x + y, but one can similarly prove the axiom of substitution for the “y” 
variable. (A quick way is to observe from the definition of x + y that we certainly 
have x + y = y+ x, since a, + by = by + dy.) 


We can define multiplication of real numbers in a manner similar to that of addi- 
tion: 


Definition 5.3.9 (Multiplication of reals). Letx = LIMy-+o0 dy and y = LIMy-+00 Dn 
be real numbers. Then we define the product xy to be xy := LIMy-566 Gnbn. 
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The following proposition ensures that this definition is valid, and that the product 
of two real numbers is in fact a real number: 


Proposition 5.3.10 (Multiplication is well-defined). Let x = LIMy +04), y= 
LIM) 00 Dn, and x’ = LIMy-, 00 a}, be real numbers. Then xy is also a real number. 
Furthermore, if x = x', then xy = x'y. 


Proof See Exercise 5.3.2. 


Of course we can prove a similar substitution rule when y is replaced by a real 
number y’ which is equal to y. 

At this point we embed the rationals back into the reals, by equating every rational 
number g with the real number LIM,-...q. For instance, if a1, a2, a3,... is the 
sequence 

0.5, 0.5, 0.5, 0.5, 0.5,... 


then we set LIM,_,. a, equal to 0.5. This embedding is consistent with our defini- 
tions of addition and multiplication, since for any rational numbers a, b we have 


this means that when one wants to add or multiply two rational numbers a, b it 
does not matter whether one thinks of these numbers as rationals or as the real 
numbers LIM,,-,.. a, LIMy_,.. b. Also, this identification of rational numbers and 
real numbers is consistent with our definitions of equality (Exercise 5.3.3). 
We can now easily define negation —x for real numbers x by the formula 
—x := (-1) xx, 
since —1 is a rational number and is hence real. Note that this is clearly consistent 
with our negation for rational numbers since we have —q = (—1) x q for all rational 
numbers q. Also, from our definitions it is clear that 
= LIMp 06 Gn = LIM p—00(—an) 

(why?). Once we have addition and negation, we can define subtraction as usual by 

X—y:=x+(-y), 
note that this implies 


LIMn- 00 an — LIMn— 00 Dn = LIMn 00 (Gn ~ bn). 


We can now easily show that the real numbers obey all the usual rules of algebra 
(except perhaps for the laws involving division, which we shall address shortly): 
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Proposition 5.3.11 All the laws of algebra from Proposition 4.1.6 hold not only for 
the integers, but for the reals as well. 


Proof We illustrate this with one such rule: x(y + z) = xy + xz. Let x = LIMn-00 
An, Y = LIM) +00 by, and z = LIMy_, 20 Cy, be real numbers. Then by definition, xy = 
LIM) +00 Gnby and xz = LIMy-.56 QnCn, and so xy + XZ = LIMy 506 (Gnbn + Gnen). 
A similar line of reasoning shows that x(y + z) = LIMy-s00 Gn (Dn + Cn). But we 
already know that a,(b, + cy) is equal to a,b, + a,c, for the rational numbers a,, 
by, Cn, and the claim follows. The other laws of algebra are proven similarly. 


The last basic arithmetic operation we need to define is reciprocation: x > x~!. 


This one is a little more subtle. One obvious first guess for how to proceed would be 
define 
(LIM); 60 a := LIMn+06 fs 


but there are a few problems with this. For instance, let a), dz, a3, .. . be the Cauchy 
sequence 
0.1, 0.01, 0.001, 0.0001,..., 


and let x := LIMy-,o0 a,. Then by this definition, x! would be LIM,,-+00 b,, where 
b,, bz, bz, ... is the sequence 


10, 100, 1000, 10000, ... 


but this is not a Cauchy sequence (it isn’t even bounded). Of course, the problem here 
is that our original Cauchy sequence (a,)°°., was equivalent to the zero sequence 
(0)°° , (why?), and hence that our real number x was in fact equal to 0. So we should 
only allow the operation of reciprocal when x is non-zero. 

However, even when we restrict ourselves to non-zero real numbers, we have a 
slight problem, because a non-zero real number might be the formal limit of a Cauchy 
sequence which contains zero elements. For instance, the number 1, which is rational 


and hence real, is the formal limit 1 = LIMy-..0 a, of the Cauchy sequence 
0, 0.9, 0.99, 0.999, 0.9999, ... 


but using our naive definition of reciprocal, we cannot invert the real number 1, 
because we can’t invert the first element 0 of this Cauchy sequence! 

To get around these problems we need to keep our Cauchy sequence away from 
zero. To do this we first need a definition. 


Definition 5.3.12 (Sequences bounded away from zero). A sequence (a,)°°, of 
rational numbers is said to be bounded away from zero iff there exists a rational 
number c > 0 such that |a,| > c for alln > 1. 


Examples 5.3.13 The sequence 1, —1, 1, —1, 1, —1, 1,... is bounded away 
from zero (all the coefficients have absolute value at least 1). But the sequence 
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0.1, 0.01, 0.001, . . .isnot bounded away from zero, and neither is 0, 0.9, 0.99, 0.999, 
0.9999, .... The sequence 10, 100, 1000, ... is bounded away from zero, but is not 
bounded. 


We now show that every non-zero real number is the formal limit of a Cauchy 
sequence bounded away from zero: 


Lemma 5.3.14 Let x be a non-zero real number. Then x = LIMy-+00 Gn for some 
CO 


Cauchy sequence (a,);—, which is bounded away from zero. 
Proof Since x is real, we know that x = LIM, b, for some Cauchy sequence 
(b,)°°.,. But we are not yet done, because we do not know that b, is bounded away 
from zero. On the other hand, we are given that x 4 0 = LIM,-... 0, which means 
that the sequence (b,)°°., is not equivalent to (0)°°,. Thus the sequence (b,)°° , 
cannot be eventually e-close to (0)°., for every e > 0. Therefore we can find an 
€ > 0 such that (b,)°° | is not eventually e-close to (0)°°,. 

Let us fix this e. We know that (b,,)°° , is a Cauchy sequence, so it is eventually e- 
steady. Moreover, it is eventually ¢/2-steady, since ¢/2 > 0. Thus there is an N > 1 
such that |b, — b,| < ¢/2 foralln,m > N. 

On the other hand, we cannot have |b,,| < ¢ for alln => N, since this would imply 
that (b,)7°., is eventually e-close to (0)°°,. Thus there must be some no > N for 
which |b,,| > €. Since we already know that |b,,, — b,| < ¢/2 foralln > N, we thus 
conclude from the triangle inequality (how?) that |b, | > ¢/2 for alln > N. 

This almost proves that (b,)°°., is bounded away from zero. Actually, what it 
does is show that (b,)?° , is eventually bounded away from zero. But this is easily 
fixed, by defining a new sequence a, by setting a, := ¢/2 ifn < N anda, := b, if 
n > N. Since b, is a Cauchy sequence, it is not hard to verify that a, is also a Cauchy 
sequence which is equivalent to b, (because the two sequences are eventually the 
same), and so x = LIMy-..0 an. And since |b,| > ¢/2 for alln > N, we know that 
|an| > €/2 for all n > 1 (splitting into the two cases n > N andn < N separately). 
Thus we have a Cauchy sequence which is bounded away from zero (by ¢/2 instead 
of ¢, but that’s still OK since ¢/2 > 0), and which has x as a formal limit, and so we 
are done. 


Once a sequence is bounded away from zero, we can take its reciprocal without 
any difficulty: 


Lemma 5.3.15 Suppose that (ay);°., is a Cauchy sequence which is bounded away 
from zero. Then the sequence (a;!)%., is also a Cauchy sequence. 

Proof Since (a,)°—, is bounded away from zero, we know that there is ac > 0 such 
that |a,| > c for alln > 1. Now we need to show that (a, Lye is eventually e-steady 
for each ¢ > O. Thus let us fix an ¢ > 0; our task is now to find an N > 1 such that 
la~! —a7!| <« foralln,m > N. But 


An — a 
= [am n 


la, —a, |= 


n m 


Cc 
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(since |dm|, |dn| = c), and so to make |a,' — a;'| less than or equal to «, it will 
suffice to make |a,, — a,| less than or equal to c*e. But since (a,)°2, is a Cauchy 
sequence, and c’e > 0, we can certainly find an N such that the sequence (a,)°° y 
is c’e-steady, 1.€., |dm — an| < ce foralln > N. By what we have said above, this 
shows that |a,! —a;,'| < € for all m,n > N, and hence the sequence (a, !)%, is 
eventually ¢-steady. Since we have proven this for every €, we have that (a, !)%, is 
a Cauchy sequence, as desired. 


We are now ready to define reciprocation: 


Definition 5.3.16 (Reciprocals of real numbers). Let x be a non-zero real num- 
ber. Let (a,)°°, be a Cauchy sequence bounded away from zero such that x = 
LIM,,+00 Gn (such a sequence exists by Lemma 5.3.14). Then we define the recip- 
rocal x~! by the formula x~! := LIM,-,.0 a, !. (From Lemma 5.3.15 we know that 
x7! is a real number.) 

We need to check one thing before we are sure this definition makes sense: what 
if there are two different Cauchy sequences (a,)°-_, and (b,)?2., which have x as 
their formal limit, x = LIM; dy = LIMn-+00 by. The above definition might con- 
ceivably give two different reciprocals x~', namely LIM, oo a, | and LIM, ,.0 7 !. 
Fortunately, this never happens: 

Lemma 5.3.17 (Reciprocation is well-defined). Let (a,)°°, and (by,)°°, be two 
Cauchy sequences bounded away from zero such that LIMy-+o dn = LIMn-+00 Yn 
(i.e., the two sequences are equivalent). Then LIMy-, a = LIMy-+00 be". 


Proof Consider the following product P of three real numbers: 
P := (LIMy-+00 4!) X (LIMy-+00 Gn) X (LIMy-+00 B71). 
If we multiply this out, we obtain 
P = LIMy 400 @, ‘dnb, | = LIMy 4.00 5; '. 
On the other hand, since LIM, 56 dy = LIMn-+00 bn, We can write P in another way 
i P = (LIMn-+00 dG) |) X (LIMn-s00 bn) X (LIMn-+c0 b7') 
(cf. Proposition 5.3.10). Multiplying things out again, we get 


P = LIM,+00 4, 'bnb>! = LIM,-+0 @;!- 


Comparing our different formulae for P we see that LIMn-... a, ' — LIM, +00 b, uF 
as desired. 


Thus reciprocal is well-defined (for each non-zero real number x, we have 
exactly one definition of the reciprocal x—!). Note it is clear from the definition that 
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xx7! = x7!x = 1 (why?); thus all the field axioms (Proposition 4.2.4) apply to the 
reals as well as to the rationals. We of course cannot give 0 a reciprocal, since 0 
multiplied by anything gives 0, not 1. Also note that if g is a non-zero rational, 
and hence equal to the real number LIM,-_,.. g, then the reciprocal of LIMy-..0 g is 
LIM,-+00 g~! = q7!; thus the operation of reciprocal on real numbers is consistent 
with the operation of reciprocal on rational numbers. 

Once one has reciprocal, one can define division x/y of two real numbers x, y, 
provided y is non-zero, by the formula 


xfyi=xxyl, 


just as we did with the rationals. In particular, we have the cancelation law: if x, y, 
z are real numbers such that xz = yz, and z is non-zero, then by dividing by z we 
conclude that x = y. Note that this cancelation law does not work when z is zero. 

We now have all four of the basic arithmetic operations on the reals: addition, 
subtraction, multiplication, and division, with all the usual rules of algebra. Next we 
turn to the notion of order on the reals. 


— Exercises — 
Exercise 5.3.1 Prove Proposition 5.3.3. (Hint: you may find Proposition 4.3.7 to be useful.) 
Exercise 5.3.2. Prove Proposition 5.3.10. (Hint: again, Proposition 4.3.7 may be useful.) 


Exercise 5.3.3 Let a,b be rational numbers. Show that a = b if and only if LIM;p..0a= 
LIMn— oo D (1.e., the Cauchy sequences a, a,a,a,... and b,b,b,b... equivalent if and only if 
a = b). This allows us to embed the rational numbers inside the real numbers in a well-defined 
manner. 


Exercise 5.3.4 Let (an)?°.9 be a sequence of rational numbers which is bounded. Let (b;)°°.9 be 
another sequence of rational numbers which is equivalent to (an) 9- Show that (bn)r9 is also 
bounded. (Hint: use Exercise 5.2.2.) 


Exercise 5.3.5 Show that LIMn-.o0 1/n = 0. 


5.4 Ordering the Reals 


We know that every rational number is positive, negative, or zero. We now want to 
say the same thing for the reals: each real number should be positive, negative, or 
zero. Since a real number x is just a formal limit of rationals a,, it is tempting to 
make the following definition: a real number x = LIM,,-..0 dy is positive if all of the 
dy are positive, and negative if all of the a, are negative (and zero if all of the a, are 
zero). However, one soon realizes some problems with this definition. For instance, 
the sequence (a,)°2, defined by a, := 107", thus 


n=1 


0.1, 0.01, 0.001, 0.0001, ... 
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consists entirely of positive numbers, but this sequence is equivalent to the zero 
sequence 0, 0, 0,0, ... and thus LIM, dy = 0. Thus even though all the rationals 
were positive, the real formal limit of these rationals was zero rather than positive. 
Another example is 

0.1, —0.01, 0.001, —0.0001, ...; 


this sequence is a hybrid of positive and negative numbers, but again the formal limit 
is zero. 

The trick, as with the reciprocals in the previous section, is to limit one’s attention 
to sequences which are bounded away from zero. 


Definition 5.4.1 Let (a,)°°., be a sequence of rationals. We say that this sequence 
is positively bounded away from zero iff we have a positive rational c > O such that 
a, = c for all n > 1 (in particular, the sequence is entirely positive). The sequence 
is negatively bounded away from zero iff we have a negative rational —c < 0 such 


that a, < —c for all n > | (in particular, the sequence is entirely negative). 


Examples 5.4.2. The sequence 1.1, 1.01, 1.001, 1.0001, ... is positively bounded 
away from zero (all terms are greater than or equal to 1). The sequence —1.1, —1.01, 
—1.001, —1.0001, ...is negatively bounded away from zero. The sequence 1, —1, 1, 
—1,1,-—1,... 1s bounded away from zero but is neither positively bounded away 
from zero nor negatively bounded away from zero. 


It is clear that any sequence which is positively or negatively bounded away from 
zero is bounded away from zero. Also, a sequence cannot be both positively bounded 
away from zero and negatively bounded away from zero at the same time. 


Definition 5.4.3 A real number x is said to be positive iff it can be written as x = 
LIM,,.0o Gn for some Cauchy sequence (a,,)?°., which is positively bounded away 
from zero. x is said to be negative iff it can be written as x = LIM)... ad, for some 


sequence (a,)°°., which is negatively bounded away from zero. 


Proposition 5.4.4 (Basic properties of positive reals). For every real number x, 
exactly one of the following three statements is true: (a) x is zero; (b) x is positive; 
(c) x is negative. A real number x is negative if and only if —x is positive. If x and y 
are positive, then so are x + y and xy. 


Proof See Exercise 5.4.1. 


Note that if g is a positive rational number, then the Cauchy sequence g,q,q,... 
is positively bounded away from zero, and hence LIM,_... g = q is a positive real 
number. Thus the notion of positivity for rationals is consistent with that for reals. 
Similarly, the notion of negativity for rationals is consistent with that for reals. 

Once we have defined positive and negative numbers, we can define absolute 
value and order. 


Definition 5.4.5 (Absolute value). Let x be a real number. We define the absolute 
value |x| of x to equal x if x is positive, —x when x is negative, and 0 when x is zero. 
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Definition 5.4.6 (Ordering of the real numbers). Let x and y be real numbers. We 
say that x is greater than y, and write x > y, iff x — y is a positive real number, and 
x < y iff x — y is a negative real number. We define x > y iff x > y or x = y, and 
similarly define x < y. 


Comparing this with the definition of order on the rationals from Definition 4.2.8 
we see that order on the reals is consistent with order on the rationals, i.e., if two 
rational numbers gq, q’ are such that q is less than g’ in the rational number system, 
then q is still less than q’ in the real number system, and similarly for “greater than”. 
In the same way we see that the definition of absolute value given here is consistent 
with that in Definition 4.3.1. 


Proposition 5.4.7 All the claims in Proposition 4.2.9 which held for rationals con- 
tinue to hold for real numbers. 


Proof We just prove one of the claims and leave the rest to Exercise 5.4.2. Suppose 
we have x < y and za positive real, and want to conclude that xz < yz.Sincex < y, 
y — x is positive, hence by Proposition 5.4.4 we have (y — x)z = yz — xzis positive, 
hence xz < yz. 


As an application of these propositions, we prove 


Proposition 5.4.8 Let x be a positive real number. Then x~' is also positive. Also, 
if y is another positive number and x > y, thenx7! < y7!. 


Proof Let x be positive. Since xx~! = 1, the real number x~! cannot be zero (since 
x0 =0 # 1). Also, from Proposition 5.4.4 it is easy to see that a positive number 
times a negative number is negative; this shows that x! cannot be negative, since 
this would imply that xx~' = 1 is negative, a contradiction. Thus, by Proposition 
5.4.4, the only possibility left is that x~! is positive. 

Now let y be positive as well, so x~! and y~! are also positive. Suppose that 
x > y.Ifx7! > y7!, then by Proposition 5.4.7 we have xx~! > yx7! > yy7!, thus 
1 > 1, which is a contradiction. Thus we must have x~! < y7!. 


Another application is that the laws of exponentiation (Proposition 4.3.12) that 
were previously proven for rationals, are also true for reals; see Sect. 5.6. 

We have already seen that the formal limit of positive rationals need not be positive; 
it could be zero, as the example 0.1, 0.01, 0.001, ... showed. However, the formal 
limit of non-negative rationals (i.e., rationals that are either positive or zero) is non- 
negative. 


Proposition 5.4.9 Let ay, dy, a3, ... be a Cauchy sequence of non-negative rational 
numbers. Then LIMn-+00 Gn is a non-negative real number. 


Eventually, we will see a better explanation of this fact: the set of non-negative 
reals is closed, whereas the set of positive reals is open. See Sect. 1.2. 
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Proof We argue by contradiction, and suppose that the real number x := LIM) -+00 Gn 
is a negative number. Then by definition of negative real number, we have x = 
LIM),,-;0o by, for some sequence b, which is negatively bounded away from zero, i.e., 
there is a negative rational —c < 0 such that b, < —c for all n > 1. On the other 
hand, we have a, > 0 for all n > 1, by hypothesis. Thus the numbers a, and b, 
are never c/2-close, since c/2 < c. Thus the sequences (a,)7°_, and (by); are not 
eventually c/2-close. Since c/2 > 0, this implies that (a,)°°, and (b,)°°., are not 
equivalent. But this contradicts the fact that both these sequences have x as their 
formal limit. 


Corollary 5.4.10 Let (a,)°° , and (b,)°°., be Cauchy sequences of rationals such 


that ay = by, for alln => 1. Then LIMn-+00 An = LIMn-+00 Dn. 


Proof Apply Proposition 5.4.9 to the sequence a, — b,. 


Remark 5.4.11 Note that the above corollary does not work if the > signs are 
replaced by >: for instance if a, := 1+1/n and b, := 1 — 1/n, then a, is always 
strictly greater than b,,, but the formal limit of a, is not greater than the formal limit 
of b,, instead they are equal. 


We now define distance d(x, y) := |x — y| just as we did for the rationals. In 
fact, Propositions 4.3.3 and 4.3.7 hold not only for the rationals, but for the reals; the 
proof is identical, since the real numbers obey all the laws of algebra and order that 
the rationals do. 

We now observe that while positive real numbers can be arbitrarily large or small, 
they cannot be larger than all of the positive integers, or smaller in magnitude than 
all of the positive rationals: 


Proposition 5.4.12 (Bounding of reals by rationals). Let x be a positive real number. 
Then there exists a positive rational number q such that q < x, and there exists a 
positive integer N such that x < N. 


Proof Since x is a positive real, it is the formal limit of some Cauchy sequence 
(a,)°°., which is positively bounded away from zero. Also, by Lemma 5.1.15, this 
sequence is bounded. Thus we have rationals g > 0 and r such that g <a, <r 
for all n > 1. But by Proposition 4.4.1 we know that there is some integer N such 
that r < N; since g is positive and g <r < N, we see that N is positive. Thus 
q <a, < N forall n > 1. Applying Corollary 5.4.10 we obtain that g < x < N, as 
desired. 


Corollary 5.4.13 (Archimedean property). Let x be a real number, and let ¢ be a 
positive real number. Then there exists a positive integer M such that Me > x. 


Proof If x is zero or negative, one can just take M = 1, so suppose that x is positive. 
Then the number x/e is positive, and hence by Proposition 5.4.12 there exists a 
positive integer N such that x/e < N. If we set M := N + 1, thenx/e < M. Now 
multiply by e. 
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This property is quite important; it says that no matter how large x is and how 
small ¢ is, if one keeps adding ¢ to itself, one will eventually overtake x. 


Proposition 5.4.14 Given any two real numbers x < y, we can find a rational num- 
ber q such thatx <q <y. 


Proof See Exercise 5.4.5. 


We have now completed our construction of the real numbers. This number system 
contains the rationals and has almost everything that the rational number system has: 
the arithmetic operations, the laws of algebra, the laws of order. However, we have 
not yet demonstrated any advantages that the real numbers have over the rationals; 
so far, even after much effort, all we have done is shown that they are at least as good 
as the rational number system. But in the next few sections we show that the real 
numbers can do more things than rationals: for example, we can take square roots in 
a real number system. 


Remark 5.4.15 Up until now, we have not addressed the fact that real numbers can 
be expressed using the decimal system. For instance, the formal limit of 


1.4, 1.41, 1.414, 1.4142, 1.41421,... 


is more conventionally represented as the decimal 1.41421 .... We will address this 
in an Appendix (B), but for now let us just remark that there are some subtleties in 
the decimal system, for instance 0.9999... and 1.000... are in fact the same real 
number. 


— Exercises — 


Exercise 5.4.1 Prove Proposition 5.4.4. (Hint: if x is not zero, and x is the formal limit of some 
sequence (dy )P° ,, then this sequence cannot be eventually ¢-close to the zero sequence (0)P° | 
for every single ¢ > 0. Use this to show that the sequence (a,)°° , is eventually either positively 


bounded away from zero or negatively bounded away from zero.) 


Exercise 5.4.2. Prove the remaining claims in Proposition 5.4.7. 


Exercise 5.4.3 Show that for every real number x there is exactly one integer N such that N < 
x < N +1. (This integer N is called the integer part of x and is sometimes denoted N = |x].) 


Exercise 5.4.4 Show that for any positive real number x > 0 there exists a positive integer N such 
thatx > 1/N > 0. 


Exercise 5.4.5 Prove Proposition 5.4.14. (Hint: use Exercise 5.4.4. You may also need to argue by 
contradiction.) 


Exercise 5.4.6 Let x, y be real numbers and let ¢ > 0 be a positive real. Show that |x — y| < « if 
and only if y—¢ <x < y+e, and that |x — y| < eifandonlyify—e<x<yte. 


Exercise 5.4.7 Let x and y be real numbers. Show that x < y + ¢ forall real numbers ¢ > O if and 
only if x < y. Show that |x — y| < e for all real numbers ¢ > 0 if and only if x = y. 
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Exercise 5.4.8 Let (an) Po , be a Cauchy sequence of rationals, and let x be a real number. Show 
thatifa, <x foralln > 1, then LIMn-o0 dn < x. Similarly, show that if a, > x for alln > 1, then 
LIMn-— oo Gn => Xx. (Hint: prove by contradiction. Use Proposition 5.4.14 to find a rational between 
LIMn- 00 G and x, and then use Proposition 5.4.9 or Corollary 5.4.10.) 


Exercise 5.4.9 If x, y are real numbers, define the maximum max(x, y) of x and y to equal x if 
x > y,and yifx < y. Similarly, define the minimum min(x, y) of x and y to equal x if x < y, and 
yifx>y. 
(i) Ifx, y arereal numbers, show that max(x, y) = — min(—x, —y) and min(x, y) = — max(—x, 
—y). 

(ii) x,y,z are real numbers, show that max(x, y) = max(y, x), max(x, x) = x, and max(x + 
Z,y +z) = max(x, y) + z. If z is non-negative, show that max(xz, yz) = zmax(x, y). What 
happens to the last claim if z is negative? 

(iii) Show that all the claims in (ii) also hold if max is replaced with min. 

(iv) Ifx, y are positive real numbers, show that max(x, y)} = min(x—!, ya!) and min(x, y)! = 

max(x—!, yl). 


5.5 The Least Upper Bound Property 


We now give one of the most basic advantages of the real numbers over the rationals; 
one can take the least upper bound sup(E) of any (non-empty, upper-bounded) subset 
E of the real numbers R. 


Definition 5.5.1 (Upper bound). Let E be a subset of R, and let M be a real number. 
We say that M is an upper bound for E, iff we have x < M for every element x in 
E. 


Example 5.5.2 Let E be the interval EF := {x € R: 0 <x < 1}. Then 1 is an upper 
bound for E, since every element of E is less than or equal to 1. It is also true that 2 
is an upper bound for E, and indeed every number greater or equal to | is an upper 
bound for E.. On the other hand, any other number, such as 0.5, is not an upper bound, 
because 0.5 is not larger than every element in EL. (Merely being larger than some 
elements of E is not necessarily enough to make 0.5 an upper bound.) 


Example 5.5.3 Let R* be the set of positive reals: Rt := {x € R: x > O}. Then 
R* does not have any upper bounds® at all (why?). 


Example 5.5.4 Let % be the empty set. Then every number M is an upper bound for 
%, because M is greater than every element of the empty set (this is a vacuously true 
statement, but still true). 


It is clear that if M is an upper bound of E£, then any larger number M' > M is also 
an upper bound of F. On the other hand, it is not so clear whether it is also possible 
for any number smaller than M to also be an upper bound of E. This motivates the 
following definition: 


3 More precisely, R+ has no upper bounds which are real numbers. In Sect. 6.2 we shall introduce 
the extended real number system R*, which allows one to give the upper bound of +00 for sets 
such as R*. 
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Definition 5.5.5 (Least upper bound). Let E be a subset of R, and M be a real 
number. We say that M is a least upper bound for E iff (a) M is an upper bound for 
E, and also (b) any other upper bound M’ for E must be larger than or equal to M. 


Example 5.5.6 Let E be the interval E := {x € R: 0 <x < 1}. Then, as noted 
before, E has many upper bounds, indeed every number greater than or equal to | 
is an upper bound. But only | is the /east upper bound; all other upper bounds are 
larger than 1. 


Example 5.5.7 The empty set does not have a least upper bound (why?). 


Proposition 5.5.8 (Uniqueness of least upper bound). Let E be a subset of R. Then 
E can have at most one least upper bound. 


Proof Let M, and M) be two least upper bounds, say M; and M). Since M, is a 
least upper bound and M) is an upper bound, then by definition of least upper bound 
we have M, > M,. Since M; is a least upper bound and M, is an upper bound, 
we similarly have M; > M2. Thus M, = M2. Thus there is at most one least upper 
bound. 


Now we come to an important property of the real numbers: 


Theorem 5.5.9 (Existence of least upper bound). Let E be a non-empty subset of 
R. [f E has an upper bound, (i.e., E has some upper bound M), then it must have 
exactly one least upper bound. 


Proof This theorem will take quite a bit of effort to prove, and many of the steps 
will be left as exercises. 

Let E be a non-empty subset of R with an upper bound M. By Proposition 5.5.8, 
we know that E has at most one least upper bound; we have to show that E has at 
least one least upper bound. Since E is non-empty, we can choose some element x 
in E. 

Let n > | be a positive integer. We know that E has an upper bound M. By 
the Archimedean property (Corollary 5.4.13), we can find an integer K such that 
K/n => M, and hence K/n is also an upper bound for E. By the Archimedean 
property again, there exists another integer L such that L/n < xg. Since xo lies in 
E, we see that L/n is not an upper bound for E. Since K /n is an upper bound but 
L/n is not, we see that K > L. 

Since K/n is an upper bound for E and L/n is not, we can find an integer 
L <m, < K with the property that m,/n is an upper bound for E, but (m, — 1)/n 
is not (see Exercise 5.5.2). In fact, this integer m, is unique (Exercise 5.5.3). We 
subscript m, by n to emphasize the fact that this integer m depends on the choice of 
n. This gives a well-defined (and unique) sequence m1, m2, m3, ... of integers, with 
each of the m,/n being upper bounds and each of the (m, — 1)/n not being upper 
bounds. 

Now let N > 1 be a positive integer, and let n, n' > N be integers larger than or 
equal to N. Since m,,/n is an upper bound for E and (m, — 1)/n' is not, we must 
have m,/n > (my — 1)/n' (why?). After a little algebra, this implies that 


102 5 The Real Numbers 


Similarly, since m,/n' is an upper bound for E and (m, — 1)/n is not, we have 
my /n' > (m, — 1)/n, and hence 


Putting these two bounds together, we see that 


Mn My 


n n 


1 
< — foralln,n'’>N> 1. 
N 


/ 


This implies that “ is a Cauchy sequence (Exercise 5.5.4). Since the “ are rational 
numbers, we can now define the real number S as 


m 
S := LIMy +00 — 
n 


From Exercise 5.3.5 we conclude that 


m, — | 


S = LIMn-+00 


To finish the proof of the theorem, we need to show that S is the least upper bound 
for E.. First we show that it is an upper bound. Let x be any element of E. Then, since 
m,/n is an upper bound for E, we have x < m,/n for alln > 1. Applying Exercise 
5.4.8, we conclude that x < LIM,-... m,/n = S. Thus S is indeed an upper bound 
for E. 

Now we show it is a least upper bound. Suppose y is an upper bound for E. Since 
(m, — 1)/n is not an upper bound, we conclude that y > (m, — 1)/n for alln > 1. 
Applying Exercise 5.4.8, we conclude that y > LIMy-,.0(mn — 1)/n = S. Thus the 
upper bound S is less than or equal to every upper bound of F, and S is thus a least 
upper bound of E. 


Definition 5.5.10 (Supremum). Let E be a subset of the real numbers. If E is non- 
empty and has some upper bound, we define sup(£) to be the least upper bound of EF 
(this is well-defined by Theorem 5.5.9). We introduce two additional symbols, +-oo 
and —oo. If E is non-empty and has no upper bound, we set sup(E) := +00; if E is 
empty, we set sup(E) := —oo. We refer to sup(£) as the supremum of E, and also 
denote it by sup E. 


Remark 5.5.11 At present, -+-oo and —oo are meaningless symbols; we have no 
operations on them at present, and none of our results involving real numbers apply 
to +00 and —oo, because these are not real numbers. In Sect. 6.2 we add +00 and 
—oo to the reals to form the extended real number system, but this system is not as 
convenient to work with as the real number system, because many of the laws of 
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algebra break down. For instance, it is not a good idea to try to define +00 + —o00; 
setting this equal to 0 causes some problems. 


Now we give an example of how the least upper bound property is useful. 
Proposition 5.5.12 There exists a positive real number x such that x = 2. 


Remark 5.5.13 Comparing this result with Proposition 4.4.4, we see that certain 
numbers are real but not rational. The proof of this proposition also shows that the 
rationals Q do not obey the least upper bound property, otherwise one could use that 
property to construct a square root of 2, which by Proposition 4.4.4 is not possible. 


Proof Let E be the set {y € R: y > Oand y? < 2}; thus E is the set of all non- 
negative real numbers whose square is less than 2. Observe that FE has an upper 
bound of 2 (because if y > 2, then y? > 4 > 2 and hence y ¢ E). Also, E is non- 
empty (for instance, | is an element of £). Thus by the least upper bound property, 
we have a real number x := sup(£) which is the least upper bound of E. Then x 
is greater than or equal to | (since 1 € F) and less than or equal to 2 (since 2 is an 
upper bound for E). So x is positive. Now we show that x” = 2. 

We argue this by contradiction. We show that both x” < 2 and x? > 2 lead to 
contradictions. First suppose that x? < 2. Let 0 < ¢ < 1 be asmall number; then we 
have 

(+e? =x" 4 Qex +e? = 3? 4 4e bem x" + Se 


since x < 2 and ¢? < ¢. Since x” < 2, we see that we can choose an 0 <« < 1 
such that x? + 5e < 2, thus (x +e)? <2. By construction of F, this means that 
x + ¢ € E; but this contradicts the fact that x is an upper bound of E. 

Now suppose that x7 > 2. Let 0 < ¢ < 1 be a small number; then we have 


(x —e)? =x? —2ex +e? > x? — ex > x? —4e 


since x < 2ande? > 0. Since x? > 2, we can choose 0 < ¢ < 1 such that x? — 4e > 
2, and thus (x — ¢)? > 2. But then this implies that x — ¢ > y for all y € E. (Why? 
Ifx — e < y then (x — ¢)* < y* < 2, acontradiction.) Thus x — ¢ is an upper bound 
for E, which contradicts the fact that x is the /east upper bound of EF. From these 
two contradictions we see that x2 = 2, as desired. 


Remark 5.5.14 In Chap. 6 we will use the least upper bound property to develop 
the theory of limits, which allows us to do many more things than just take square 
roots. 


Remark 5.5.15 Wecan of course talk about lower bounds and greatest lower bounds, 
of sets E; the greatest lower bound of a set E is also known as the infimum’* of E and 


4 Supremum means “highest” and infimum means “lowest”, and the plurals are suprema and infima. 
Supremum is to superior, and infimum to inferior, as maximum is to major, and minimum to minor. 
The root words are “super”, which means “above”, and “infer”, which means “below” (this usage 
only survives in a few rare English words such as “infernal”, with the Latin prefix “sub” having 
mostly replaced “infer” in English). 
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is denoted inf(E) or inf E. Everything we say about suprema has a counterpart for 
infima; we will usually leave such statements to the reader. A precise relationship 
between the two notions is given by Exercise 5.5.1. See also Sect. 6.2. 


— Exercises — 


Exercise 5.5.1 Let E be asubset of the real numbers R, and suppose that E has a least upper bound 
M which is a real number, i.e., M = sup(£). Let —E be the set 


-E:={-x:x€ E}. 


Show that — M is the greatest lower bound of — E, i.e., —M = inf(—E). 


Exercise 5.5.2 Let E be anon-empty subset of R, letn > 1 be an integer, and let L < K be integers. 
Suppose that K /n is an upper bound for £, but that L/n is not an upper bound for £. Without using 
Theorem 5.5.9, show that there exists an integer L < m < K such that m/n is an upper bound for 
E, but that (m — 1)/n is not an upper bound for E. (Hint: prove by contradiction, and use induction. 
It may also help to draw a picture of the situation.) 


Exercise 5.5.3 Let E be a non-empty subset of R, let n > 1 be an integer, and let m, m’ be integers 
with the properties that m/n and m’/n are upper bounds for E, but (m — 1)/n and (m’ — 1)/n are 
not upper bounds for E. Show that m = m’. This shows that the integer m constructed in Exercise 
5.5.2 is unique. (Hint: again, drawing a picture will be helpful.) 


Exercise 5.5.4 Let qi, q2, q3, ... be a sequence of rational numbers with the property that |qn — 


Qn'| < u whenever M > 1 is an integer and n,n’ > M. Show that qi, 92, q3,... is a Cauchy 


sequence. Furthermore, if S := LIMn-+o0 dn, show that |gy — S| < u for every M > 1. (Hint: 
use Corollary 5.4.10 or Exercise 5.4.8.) 


Exercise 5.5.5 Establish an analogue of Proposition 5.4.14, in which “rational” is replaced by 
“irrational”. 


5.6 Real Exponentiation, Part I 


In Sect. 4.3 we defined exponentiation x” when x is rational and n is a natural 
number, or when x is a non-zero rational and 7 is an integer. Now that we have 
all the arithmetic operations on the reals (and Proposition 5.4.7 assures us that the 
arithmetic properties of the rationals that we are used to, continue to hold for the 
reals) we can similarly define exponentiation of the reals. 


Definition 5.6.1 (Exponentiating a real by a natural number). Let x be a real num- 
ber. To raise x to the power 0, we define x° := 1. Now suppose recursively that x” 
has been defined for some natural number n, then we define x”+! := x” x x. 


Definition 5.6.2 (Exponentiating a real by an integer). Let x be a non-zero real 
number. Then for any negative integer —n, we define x~" := 1/x”. 


Clearly these definitions are consistent with the definition of rational exponenti- 
ation given earlier. We can then assert 
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Proposition 5.6.3 All the properties in Propositions 4.3.10 and 4.3.12 remain valid 
if x and y are assumed to be real numbers instead of rational numbers. 


Instead of giving an actual proof of this proposition, we shall give a meta-proof 
(an argument appealing to the nature of proofs, rather than the nature of real and 
rational numbers). 


Meta-proof: If one inspects the proof of Propositions 4.3.10 and 4.3.12 we see that 
they rely on the laws of algebra and the laws of order for the rationals (Propositions 
4.2.4 and 4.2.9). But by Propositions 5.3.11 and 5.4.7, and the identity xx~! = 
x7~!x = 1 we know that all these laws of algebra and order continue to hold for real 
numbers as well as rationals. Thus we can modify the proof of Proposition 4.3.10 
and 4.3.12 to hold in the case when x and y are real. 

Now we consider exponentiation to exponents which are not integers. We begin 


with the notion of an nth root, which we can define using our notion of supremum. 


Definition 5.6.4 Let x > 0 be anon-negative real, and letn > 1 bea positive integer. 
We define x!/”, also known as the nth root of x, by the formula 


x'/" := supfy € R: y > Oand y" < x}. 


We often write /x for x!/?. 


Note we do not define the nth root of a negative number. In fact, we will leave the 
nth roots of negative numbers undefined for the rest of the text (one can define these 
nth roots once one defines the complex numbers, but we shall refrain from doing so). 


Lemma 5.6.5 (Existence of nth roots). Let x > 0 be a non-negative real, and let 
n > 1 be a positive integer. Then the set E := {y € RR: y > Oand y" < x} is non- 


empty and is also bounded above. In particular, x'/" is a real number. 


Proof The set E contains 0 (why?), so it is certainly not empty. Now we show it has 
an upper bound. We divide into two cases: x < | and x > 1. First suppose that we 
are in the case where x < 1. Then we claim that the set E is bounded above by 1. 
To see this, suppose for sake of contradiction that there was an element y € E for 
which y > |. But then y” > 1 (why?), and hence y” > x, a contradiction. Thus EF 
has an upper bound. Now suppose that we are in the case where x > 1. Then we 
claim that the set E is bounded above by x. To see this, suppose for contradiction 
that there was an element y € E for which y > x. Since x > 1, we thus have y > I. 
Since y > x and y > 1, we have y” > x (why?), a contradiction. Thus in both cases 
E has an upper bound, and so x!/” is finite. 


We list some basic properties of nth roots below. 


Lemma 5.6.6 Let x, y > 0 be non-negative reals, and letn, m > | be positive inte- 
gers. 
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(a) Ify =x!/", then y" = x. 

(b) Conversely, if y" = x, then y = x". 

(c) x!/" is a non-negative real number, and is positive iff x is positive. 

(d) We have x > y if and only if x'/" > y"/", 

(e) Ifx > 1, thenx'/* is a decreasing function of k, where k ranges over the positive 
integers; that is to say, xk < x!/! whenever k > 1. If0 <x <1, then xk is 
an increasing function of k (i.e., x‘/* > x‘/! whenever k > 1). If x = 1, then 
xk — 1 forallk. 

(f) We have (xy)!/" = x'/"yl/n, 

(g) We have (x/")l/m — xl/nm_ 


Proof See Exercise 5.6.1. 


The observant reader may note that this definition of x!/" might possibly be 


inconsistent with our previous notion of x” when n = 1, but it is easy to check that 
x!/1 — x = x! (why?), so there is no inconsistency. 

One consequence of Lemma 5.6.6(b) is another proof of the cancelation law from 
Proposition 4.3.12(c) and Proposition 5.6.3: if y and z are positive and y” = z”, then 
y = z. (Why does this follow from Lemma 5.6.6(b)?) Note that this only works when 
y and z are positive; for instance, (—3)* = 3, but we cannot conclude from this that 
—3=3. 

Now we define how to raise a positive number x to a rational exponent q. 


Definition 5.6.7 Let x > 0 bea positive real number, and let g be a rational number. 
To define x7, we write g = a/b for some integer a and positive integer b, and define 


x4 = (x!/9)4, 


Note that every rational g, whether positive, negative, or zero, can be written in 
the form a/b where a is an integer and b is positive (why?). However, the rational 
number q can be expressed in the form a/b in more than one way, for instance 1/2 
can also be expressed as 2/4 or 3/6. So to ensure that this definition is well-defined, 
we need to check that different expressions a/b give the same formula for x7: 


Lemma 5.6.8 Let a,a’ be integers and b, b’ be positive integers such that a/b = 
a'/b’, and let x be a positive real number. Then we have (x l/h ya" = (x! /Pya, 


Proof There are three cases: a = 0, a > 0, a < 0. If a= 0, then we must have 
a’ = 0 (why?) and so both (x!/4) and (x!/°)¢ are equal to 1, so we are done. 

Now suppose thata > 0.Thena’ > 0(why?), andab’ = ba’. Write y := x!/@) = 
x'/(62) By Lemma 5.6.6(g) we have y = (x!/")!/4 and y = (x!/%)!/"; by Lemma 
5.6.6(a) we thus have y4 = x!/% and y” = x!/>, Thus we have 


Gye = eo id = ail = (y")4 = (2 


as desired. 
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Finally, suppose thata < 0. Then we have (—a)/b = (—a')/b’. But —a is positive, 
so the previous case applies and we have (x!/"")~* = (x!/)~¢, Taking the reciprocal 
of both sides we obtain the result. 


Thus x? is well-defined for every rational g. Note that this new definition is 
consistent with our old definition for x!/” (why?) and is also consistent with our old 
definition for x” (why?). 

Some basic facts about rational exponentiation: 


Lemma 5.6.9 Let x, y > 0 be positive reals, and let q,r be rationals. 


(a) x? is a positive real. 

(b) x9t" = x9x" and (x4)" = x1". 

(c) x-4 = 1/x!, 

(d) Ifq > 0, thenx > y if and only if x4 > y!. 

(e) Ifx > 1, thenx? > x" ifand only ifq > r. Ifx <1, then x4 > x" ifand only if 
q <r. 

(f) (ry)? = x4y4. 


Proof See Exercise 5.6.2. 


We still have to do real exponentiation; in other words, we still have to define x” 
where x > O and y is a real number—but we will defer that until Sect. 6.7, once we 
have formalized the concept of limit. 

In the rest of the text we shall now just assume the real numbers to obey all the 
usual laws of algebra, order, and exponentiation. 


— Exercises — 


Exercise 5.6.1 Prove Lemma 5.6.6. (Hints: review the proof of Proposition 5.5.12. Also, you will 
find proof by contradiction a useful tool, especially when combined with the trichotomy of order 
in Proposition 5.4.7 and Proposition 5.4.12. The earlier parts of the lemma can be used to prove 
later parts of the lemma. With part (e), first show that if x > 1 then x!/" > 1, and if x < 1 then 
x! /n <1.) 


Exercise 5.6.2. Prove Lemma 5.6.9. (Hint: you should rely mainly on Lemma 5.6.6 and on algebra.) 


Exercise 5.6.3 If x is areal number and n is an even natural number (thus n = 2m for some natural 
number m), show that x” > 0. 


Exercise 5.6.4 If x is a real number, show that |x| = (x7)!/?. 
Exercise 5.6.5 If x, y are positive reals, and g is a positive rational with g > 1, show that 


max(x?, y2) = max(x, y)% and min(x7, y7) = min(x, y)%, where the operations min, max were 
defined in Exercise 5.4.9. What happens if we have q < 1 instead of g > 1? 


Chapter 6 M®) 
Limits of Sequences ra 


6.1 Convergence and Limit Laws 


In the previous chapter, we defined the real numbers as formal limits of ratio- 
nal (Cauchy) sequences, and we then defined various operations on the real num- 
bers. However, unlike our work in constructing the integers (where we eventually 
replaced formal differences with actual differences) and rationals (where we even- 
tually replaced formal quotients with actual quotients), we did not completely finish 
the job of constructing the real numbers, because we never got around to replacing 
formal limits LIM,,-. 55 dy with actual limits lim,_,.. a,. In fact, we haven’t defined 
limits at all yet. This will now be rectified. 

We begin by repeating much of the machinery of e-close sequences, etc., again— 
but this time, we do it for sequences of real numbers, not rational numbers. Thus 
this discussion will supercede what we did in the previous chapter. First, we define 
distance for real numbers: 


Definition 6.1.1 (Distance between two real numbers). Given two real numbers x 
and y, we define their distance d(x, y) to be d(x, y) := |x — y|. 


Clearly this definition is consistent with Definition 4.3.2. Further, Proposition 
4.3.3 works just as well for real numbers as it does for rationals, because the real 
numbers obey all the rules of algebra that the rationals do. 


Definition 6.1.2 (e-close real numbers). Let ¢ > 0 be a real number. We say that 
two real numbers x, y are ¢-close iff we have d(y, x) < «. 


Again, it is clear that this definition of ¢-close is consistent with Definition 4.3.4. 
Now let (a,)°°,, be a sequence of real numbers; i.e., we assign a real number 
a, for every integer n > m. The starting index m is some integer; usually this will 
be 1, but in some cases we will start from some index other than 1. (The choice of 
label used to index this sequence is unimportant; we could use for instance (ax) 7, 
and this would represent exactly the same sequence as (a,)°°_,,.) We can define the 


notion of a Cauchy sequence in the same manner as before. 
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Definition 6.1.3 (Cauchy sequences of reals). Let ¢ > 0 be a real number. A 
sequence (a,)°°., of real numbers starting at some integer index N is said to be 
eé-steady iff a; and a, are e-close for every j,k > N. A sequence (a,,)7°,, Starting at 
some integer index m is said to be eventually e-steady iff there exists an N > m such 
that (a, )°° y is e-steady. We say that (a,)°°_,, is a Cauchy sequence iff it is eventually 
é-steady for every € > 0. 


To put it another way, a sequence (a,,)°°.,,, of real numbers is a Cauchy sequence 
if, for every real € > O, there exists an N > m such that |a, — a,| < ¢ foralln,n’ > 
N. These definitions are consistent with the corresponding definitions for rational 
numbers (Definitions 5.1.3, 5.1.6, 5.1.8), although verifying consistency for Cauchy 
sequences takes a little bit of care. 

Proposition 6.1.4 Let (a,)°-,, be a sequence of rational numbers starting at some 
integer index m. Then (ay)°~_,, is a Cauchy sequence in the sense of Definition 5.1.8 
if and only if it is a Cauchy sequence in the sense of Definition 6.1.3. 


Proof Suppose first that (a,,)°°_,, is a Cauchy sequence in the sense of Definition 
6.1.3; then it is eventually e-steady for every real e > O. In particular, it is eventually 
eé-steady for every rational e > 0, which makes it a Cauchy sequence in the sense of 
Definition 5.1.8. 

Now suppose that (a, )"°.,, is a Cauchy sequence in the sense of Definition 5.1.8; 
then it is eventually ¢-steady for every rational e > 0. If ¢ > Ois areal number, then 
there exists a rational e' > 0 which is smaller than e, by Proposition 5.4.12. Since é’ 
is rational, we know that (a,)°°_,, is eventually e’-steady; since e’ < ¢, this implies 
that (a,)7°,, is eventually e-steady. Since ¢ is an arbitrary positive real number, we 
thus see that (a,)P°.,,, is a Cauchy sequence in the sense of Definition 6.1.3. 


n=m 


Because of this proposition, we will no longer care about the distinction between 
Definition 5.1.8 and Definition 6.1.3 and view the concept of a Cauchy sequence as 
a single unified concept. 

Now we talk about what it means for a sequence of real numbers to converge to 
some limit L. 


Definition 6.1.5 (Convergence of sequences). Let ¢ > 0 be a real number, and let 
L be areal number. A sequence (a,)°°_,, of real numbers is said to be €-close to L 
iff a, is e-close to L for every n > N, i.e., we have |a, — L| < ¢ for everyn > N. 
We say that a sequence (a,)°-,,, is eventually e-close to L iff there exists an N > m 
such that (a,,)°° y is e-close to L. We say that a sequence (a,)°_,, converges to L iff 


it is eventually e-close to L for every real ¢ > 0. 


One can unwrap all the definitions here and write the concept of convergence 
more directly; see Exercise 6.1.2. 


Example 6.1.6 The sequence 


0.9, 0.99, 0.999, 0.9999, ... 
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is 0.1-close to 1 but is not 0.01-close to 1, because of the first element of the sequence. 
However, it is eventually 0.01-close to 1. In fact, for every real e > 0, this sequence 
is eventually e-close to 1, hence is convergent to 1. 

Proposition 6.1.7 (Uniqueness of limits). Let (a,)°°,, be a real sequence starting 
at some integer index m, and let L # L' be two distinct real numbers. Then it is not 


possible for (ay), to converge to L while also converging to L’. 


Proof Suppose for sake of contradiction that (a,)°°.,,, was converging to both L and 


L’.Lete = |L — L'|/3; note that ¢ is positive since L # L’. Since (a,)°~,,, converges 
to L, we know that (a,)°°_,, is eventually e-close to L; thus there isan N > m such that 
d(an, L) < € foralln > N. Similarly, there isan M > m such that d(a,, L’) < e for 
alln > M. In particular, if we set n := max(N, M), then we have d(a,, L) < ¢ and 
d(ay, L') < €, hence by the triangle inequality d(L, L’) < 2e = 2|L — L’|/3. But 
then we have |L — L’| < 2|L — L’|/3, which contradicts the fact that |L — L’| > 0. 
Thus it is not possible to converge to both L and L’. 


Now that we know limits are unique, we can set up notation to specify them: 


[oe] 
nem converges to some 


is convergent and that its limit is L; we write 


Definition 6.1.8 (Limits of sequences). If a sequence (d,) 
real number L, we say that (a,)°° 


n=m 


L= lim ay 
n—>oo 


Sa ‘ 
oy iS not converging to any real number L, we 


is divergent and we leave limy-, 0 a, undefined. 


to denote this fact. If a sequence (a,,) 
say that the sequence (a,)°° 


na=m 


Note that Proposition 6.1.7 ensures that a sequence can have at most one limit. 
Thus, if the limit exists, it is a single real number, otherwise it is undefined. 


Remark 6.1.9 The notation lim), .0 d, does not give any indication about the start- 
ing index m of the sequence, but the starting index is irrelevant (Exercise 6.1.3). Thus 
in the rest of this discussion we shall not be too careful as to where these sequences 
start, as we shall be mostly focused on their limits. 


We sometimes use the phrase “a, — x asm — oo” as an alternate way of writing 
the statement “(a,)°°_,, converges to x”. Bear in mind, though, that the individual 
statements a, — x and n — oo do not have any rigorous meaning; this phrase is 
just a convention, though of course a very suggestive one. 


Remark 6.1.10 The exact choice of letter used to denote the index (in this case n) 
is irrelevant: the phrase lim,-,.. a, has exactly the same meaning as limg-.o ax, 
for instance. Sometimes it will be convenient to change the label of the index to 
avoid conflicts of notation; for instance, we might want to change n to k because n is 
simultaneously being used for some other purpose, and we want to reduce confusion. 
See Exercise 6.1.4. 


112 6 Limits of Sequences 


As an example of a limit, we present 
Proposition 6.1.11 We have limy-... 1/n = 0. 


Proof We have to show that the sequence (a, )°° , converges to0, where a, := 1/n.In 
other words, for every ¢ > 0, we need to show that the sequence (a,)°°_, is eventually 
e-close to 0. So, let ¢ > 0 be an arbitrary real number. We have to find an N such 
that |a, — 0| < e foreveryn > N. Butifn > N, then 


lan — O| = |1/n —O| = 1/n <1/N. 


Thus, if we pick N > 1/e (which we can do by the Archimedean principle), then 
1/N <e, and so (a,)°° y is e-close to 0. Thus (a,)P°., is eventually e-close to 0. 


Since € was arbitrary, (a,)°° , converges to 0. 


Proposition 6.1.12 (Convergent sequences are Cauchy). Suppose that (a,)?~,,, is a 
convergent sequence of real numbers. Then (a,)7~_,, is also a Cauchy sequence. 


Proof See Exercise 6.1.5. 


Example 6.1.13 The sequence 1, —1, 1,—1,1,—1,... is not a Cauchy sequence 
(because it is not eventually 1-steady), and is hence not a convergent sequence, by 
Proposition 6.1.12. 


Remark 6.1.14 For a converse to Proposition 6.1.12, see Theorem 6.4.18. 


Now we show that formal limits can be superceded by actual limits, just as formal 
subtraction was superceded by actual subtraction when constructing the integers, 
and formal division superceded by actual division when constructing the rational 
numbers. 


Proposition 6.1.15 (Formal limits are genuine limits). Suppose that (dn)°°, is a 


Cauchy sequence of rational numbers. Then (a,)°°_, converges to LIMy-+ 00 Gp; i.e. 


LIMa, = lim ay. 
noo n—->oo 


Proof See Exercise 6.1.6. 


Definition 6.1.16 (Bounded sequences). A sequence (a,)°°,,, of real numbers is 


bounded by a real number M iff we have |a,| < M for all n > m. We say that 
(a,)°-~.,, 18 bounded iff it is bounded by M for some real number M > 0. 


This definition is consistent with Definition 5.1.12; see Exercise 6.1.7. 

Recall from Lemma 5.1.15 that every Cauchy sequence of rational numbers is 
bounded. An inspection of the proof of that Lemma shows that the same argument 
works for real numbers; every Cauchy sequence of real numbers is bounded. In 
particular, from Proposition 6.1.12 we have 
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Corollary 6.1.17 Every convergent sequence of real numbers is bounded. 


Example 6.1.18 The sequence 1, 2,3,4,5,... is not bounded, and hence is not 
convergent. 


We can now prove the usual limit laws. 


Theorem 6.1.19 (Limit Laws). Let (a,)°-,,, and (bn)°~,,, be convergent sequences of 
real numbers, and let x, y be the real numbers x := liMy-+0 Gn and y := liMy-+00 Dn. 


(a) The sequence (dy + bn) p-, converges to x + y; in other words, 


lim (dn + bn) = lim a, + lim by. 
noo noo n—> Oo 


(b) The sequence (aynby)°-—,, converges to xy; in other words, 


n=m 


lim (a,b,) = (lim an) (lim bn) 
noo noo n> 0o 


(oe) 


hom Converges to cx; in other words, 


(c) For any real number c, the sequence (cay) 
lim (ca,) = c lim dy. 
noo n—->Co 
(d) The sequence (ay — bn)°~,, converges to x — y; in other words, 
lim (a, — bn) = lim a, — lim Dy. 
noo noo n—>0o 


(e) Suppose that y # 0, and that b, # 0 for alln > m. Then the sequence (b,')~.,, 
converges to y—'; in other words, 


-1 
lim b,' = (Jim bn) 
n—->oo n—-> Oo 


(f) Suppose that y 4 0, and that b, # Oforalln > m. Then the sequence (ay /bn)°~ 
converges to x /y; in other words, 


(oe) 


hom converges to max(x, y); in other words, 


(g) The sequence! (max(an, bn)) 


lim max(a,, by) = max ( lim a,, lim bn) 
now n—- oo no 


' The operations min, max are defined in Exercise 5.4.9. 
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[oe] 
na=m 


(h) The sequence (min(ay, by)) converges to min(x, y); in other words, 


lim min(a,, by) = min ( lim a,, lim bn) 
n—-oo nN—-> Oo n—->oo 


Proof See Exercise 6.1.8. 


— Exercises — 


Exercise 6.1.1 Let (ay ean be a sequence of real numbers, such that ay+1 > a, for each natural 
number n. Prove that whenever n and m are natural numbers such that m > n, then we have am > dn. 
(We refer to these sequences as strictly increasing sequences.) 


Exercise 6.1.2 Let (an)°°,,, be a sequence of real numbers, and let L be a real number. Show that 


(an) converges to L if and only if, given any real ¢ > 0, one can find an N > m such that 


lan — L| < e foralln > N. 


Exercise 6.1.3 Let (an)°2,, be a sequence of real numbers, let c be a real number, and let m’ > m 
be an integer. Show that (a,)°°,, converges to c if and only if (An) an! converges to c. 

Exercise 6.1.4 Let (d,)r°,,, be a sequence of real numbers, let c be a real number, and let k > 0 be 
a non-negative integer. Show that (a,)72,,, converges to c if and only if (an4%)¢2,, converges to c. 
Exercise 6.1.5 Prove Proposition 6.1.12. (Hint: use the triangle inequality, or Proposition 4.3.7.) 


Exercise 6.1.6 Prove Proposition 6.1.15, using the following outline. Let (a,)°°_, be a Cauchy 
sequence of rationals, and write L := LIMn-—+oo dn. We have to show that (an) Po , converges to L. 
Let ¢ > 0. Assume for sake of contradiction that sequence a, is not eventually e-close to L. Use this, 
and the fact that (an) ro, is Cauchy, to show that there is an N > m such that either a, > L + ¢/2 


for alln > N, ora, < L —€/2 for alln > N. Then use Exercise 5.4.8. 


Exercise 6.1.7 Show that Definition 6.1.16 is consistent with Definition 5.1.12 (i.e., prove an 
analogue of Proposition 6.1.4 for bounded sequences instead of Cauchy sequences). 


Exercise 6.1.8 Prove Theorem 6.1.19. (Hint: you can use some parts of the theorem to prove others, 
e.g., (b) can be used to prove (c); (a),(c) can be used to prove (d); and (b), (e) can be used to prove 
(f). The proofs are similar to those of Lemma 5.3.6, Proposition 5.3.10, and Lemma 5.3.15. For (e), 
you may need to first prove the auxiliary result that any sequence whose elements are non-zero, and 
which converges to a non-zero limit, is bounded away from zero.) 


Exercise 6.1.9 Explain why Theorem 6.1.19(f) fails when the limit of the denominator is 0. (To 
repair that problem requires L’H6pital’s rule, see Section 10.5.) 


Exercise 6.1.10 Show that the concept of equivalent Cauchy sequence, as defined in Definition 
5.2.6, does not change if ¢ is required to be positive real instead of positive rational. More precisely, 
if (dn) Po 9 and (bn )P° 9 are sequences of reals, show that (dn )P° 9 and (bn )°° 9 are eventually e-close 
for every rational e > 0 if and only if they are eventually e-close for every real ¢ > 0. (Hint: modify 
the proof of Proposition 6.1.4.) 
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6.2 The Extended Real Number System 


There are some sequences which do not converge to any real number but instead 
seem to be wanting to converge to +-oo or —oo. For instance, it seems intuitive that 
the sequence 

1,.2, 354,35)... 


should be converging to +00, while 


should be converging to —oo. Meanwhile, the sequence 
1,-1,1,-1,1,-1,... 


does not seem to be converging to anything (although we shall see later that it does 
have +1 and —1 as “limit points”—see below). Similarly the sequence 


1, —2, 3, —4,5, —6,... 


does not converge to any real number, and also does not appear to be converging to 
+oo or converging to —co. To make this precise we need to talk about something 
called the extended real number system. 


Definition 6.2.1 (Extended real number system). The extended real number system 
R* is the real line R with two additional elements attached, called +-oo and —oo. 
These elements are distinct from each other and also distinct from every real number. 
An extended real number x is called finite iff it is a real number, and infinite iff it is 
equal to +00 or —oo. (This definition is not directly related to the notion of finite 
and infinite sets in Section 3.6, though it is of course similar in spirit.) 


These new symbols, +oo and —ox, at present do not have much meaning, since 
we have no operations to manipulate them (other than equality = and inequality 
4). As with many of the other mathematical concepts considered here, the precise 
construction of +-oo and —oo is not important, but (by Exercise 3.2.2) one could 
for instance set +00 := {R} and —oo := {R U {o00}} if desired. Now we place a few 
operations on the extended real number system. 


Definition 6.2.2 (Negation of extended reals). The operation of negation x > —x 
on R, we now extend to R* by defining —(++00) := —oo and —(—00) := +00. 


Thus every extended real number x has a negation, and —(—x) is always equal to 
X 


Definition 6.2.3 (Ordering of extended reals). Let x and y be extended real numbers. 
We say that x < y, i.e., x is less than or equal to y, iff one of the following three 
statements is true: 
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(a) x and y are real numbers, and x < y as real numbers. 
(b) y= +00. 
(c) x = —o. 


We say that x < y if we have x < y and x # y. We sometimes write x < y as 
y>x,andx <yasy>x. 


Example 6.2.4 3 < 5,3 < +00, and —oo < +00, but 3 £ —oo. 
Some basic properties of order and negation on the extended real number system: 


Proposition 6.2.5 Let x, y, z be extended real numbers. Then the following state- 
ments are true: 


(a) (Reflexivity) We have x < x. 

(b) (Trichotomy) Exactly one of the statements x < y, x = y, or x > y is true. 
(c) (Transitivity) If x < y and y < z, thenx < z. 

(d) (Negation reverses order) If x < y, then—y < —x. 


Proof See Exercise 6.2.1. 


One could also introduce other operations on the extended real number system, 
such as addition and multiplication. However, this is somewhat dangerous as these 
operations will almost certainly fail to obey the familiar rules of algebra. For instance, 
to define addition it seems reasonable (given one’s intuitive notion of infinity) to set 
+oo + 5 = +00 and +00 + 3 = +00, but then this implies that +00 + 5 = +00 + 
3, while 5 4 3. So things like the cancelation law begin to break down once we try 
to operate involving infinity. To avoid these issues we shall simply not define any 
arithmetic operations on the extended real number system other than negation and 
order. 

Remember that we defined the notion of supremum or least upper bound of a 
set E of reals; this gave an extended real number sup(£), which was either finite or 
infinite. We now extend this notion slightly. 


Definition 6.2.6 (Supremum of sets of extended reals). Let E be a subset of R*. 
Then we define the supremum sup(E£) or least upper bound of E by the following 
rule. 


(a) If E is contained in R (i.e., +-oo and —oo are not elements of E), then we let 
sup(£) be as defined in Definition 5.5.10. 

(b) If E contains +00, then we set sup(E) := +00. 

(c) If E does not contain +oo but does contain —oo, then we set sup(E) := 
sup(E\{—oo}) (which is a subset of R and thus falls under case (a)). 


We also define the infimum inf (E) of E (also known as the greatest lower bound 
of E) by the formula 
inf(Z) := — sup(—E) 


where —E is the set —FE := {—x: x € E}. 
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Example 6.2.7 Let E be the negative integers, together with —oo: 
E = {-1, —2, —3, —4, ...} U {—oo}. 
Then sup(E£) = sup(E\{—oo}) = —1, while 
inf(Z) = — sup(—E) = —(+00) = —oo. 


Example 6.2.8 The set {0.9, 0.99, 0.999, 0.9999, ...} has infimum 0.9 and supre- 
mum |. Note that in this case the supremum does not actually belong to the set, but 
it is in some sense “touching it” from the right. 


Example 6.2.9 The set {1, 2,3, 4,5...} has infimum 1 and supremum +00. 


Example 6.2.10 Let E be the empty set. Then sup(£) = —oo and inf(E) = +00 
(why?). This is the only case in which the supremum can be less than the infimum 
(why?). 


One can intuitively think of the supremum of E as follows. Imagine the real line 
with +00 somehow on the far right, and —oo on the far left. Imagine a piston at +00 
moving leftward until it is stopped by the presence of a set E; the location where 
it stops is the supremum of £. Similarly if one imagines a piston at —oo moving 
rightward until it is stopped by the presence of E, the location where it stops is the 
infimum of FE. In the case when F is the empty set, the pistons pass through each 
other, the supremum landing at —oo and the infimum landing at +00. 

The following theorem justifies the terminology “least upper bound” and “greatest 
lower bound”: 


Theorem 6.2.11 Let E be a subset of R*. Then the following statements are true. 


(a) For every x € E we have x < sup(E) and x > inf(E). 

(b) Suppose that M € R* is an upper bound for E, i.e., x < M forall x € E. Then 
we have sup(E) < M. 

(c) Suppose that M € R* is a lower bound for E, i.e., x > M for all x € E. Then 
we have inf(E) > M. 


Proof See Exercise 6.2.2. 


— Exercises — 
Exercise 6.2.1 Prove Proposition 6.2.5. (Hint: you may need Proposition 5.4.7.) 
Exercise 6.2.2. Prove Theorem 6.2.11. (Hint: you may need to break into cases depending on 


whether +00 or —oo belongs to E. You can of course use Definition 5.5.10, provided that E 
consists only of real numbers.) 


118 6 Limits of Sequences 


6.3 Suprema and Infima of Sequences 


Having defined the notion of a supremum and infimum of sets of reals, we can now 
also talk about the supremum and infimum of a sequence. 

Definition 6.3.1 (Suprema and infima of sequences). Let (a,)°,,, be a sequence of 
real numbers. Then we define sup(a,)°°,,, to be the supremum of the set {a, : n > m}, 
and inf(a,)°2,, to the infimum of the same set {a, :n > m}. 


n=m 


Remark 6.3.2. The quantities sup(a,,)°°,,, and inf (a,)°°_,, are sometimes written as 


na=m n=m 


SUP, > An and inf,>m An respectively. 


Example 6.3.3 Let a, := (—1)"; thus (a,)°, is the sequence —1,1,—1,1,.... 
Then the set {a, : n > 1} is just the two-element set {—1, 1}, and hence sup(a,)°° , 
is equal to 1. Similarly inf(a,)°° , is equal to —1. 


Example 6.3.4 Let a, := 1/n; thus (a,)°° , is the sequence 1, 1/2, 1/3,.... Then 
the set {a,, : n > 1} is the countable set {1, 1/2, 1/3, 1/4, .. .}. Thus sup(a,)°° , = 1 
and inf (a,)7°., = 0 (Exercise 6.3.1). Notice here that the infimum of the sequence is 
not actually a member of the sequence, though it becomes very close to the sequence 
eventually. (So it is a little inaccurate to think of the supremum and infimum as the 
“largest element of the sequence” and “smallest element of the sequence”, respec- 
tively.) 

Example 6.3.5 Let a, :=n; thus (a,)°°, is the sequence 1, 2, 3, 4,.... Then the 
set {a, : n > 1} is just the positive integers {1, 2,3, 4, ...}. Then sup(a,)°° , = +00 
and inf(a,)°o, = 1. 


As the last example shows, it is possible for the supremum or infimum of a 
sequence to be +00 or — oo. However, if a sequence (a,,)°°_,, is bounded, say bounded 
by M, then all the elements a, of the sequence lie between —M and M, so that the 
set {a, :n > m} has M as an upper bound and —M as a lower bound. Since this 
set is clearly non-empty, we can thus conclude that the supremum and infimum of a 
bounded sequence are real numbers (i.e., not +-oo and —oo). 


(oe) 


Proposition 6.3.6 (Least upper bound property). Let (a,)°°_,,, be a sequence of real 
numbers, and let x be the extended real number x := sup(a,)°~.,,. Then we have 
ay < x foralln > m. Also, whenever M € R* is an upper bound for ay (i.€., Ay < M 
foralln > m), we have x < M. Finally, for every extended real number y for which 
y < x, there exists at least one n > m for which y < ay < x. 


Proof See Exercise 6.3.2. 


Remark 6.3.7 There is a corresponding Proposition for infima, but with all the ref- 
erences to order reversed, e.g., all upper bounds should now be lower bounds, etc. 
The proof is exactly the same. 
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Now we give an application of these concepts of supremum and infimum. In the 
previous section we saw that all convergent sequences are bounded. It is natural to ask 
whether the converse is true: are all bounded sequences convergent? The answer is 
no; for instance, the sequence |, —1, 1, —1, .. . is bounded, but not Cauchy and hence 
not convergent. However, if we make the sequence both bounded and monotone (i.e., 
increasing or decreasing), then it is true that it must converge: 


Proposition 6.3.8 (Monotone bounded sequences converge). Let (a,)°,,, be a 
sequence of real numbers which has some finite upper bound M € R, and which 
is also increasing (i.€., An41 > Gn for alln > m). Then (an)c~,, is convergent, and 
in fact 

lim ad, = sup(an)-~, <M. 

n->oo 


Proof See Exercise 6.3.3. 


One can similarly prove that if a sequence (a,,)?°_,,, is bounded below and decreas- 
ing (1.€., dn41 < a,), then it is convergent, and that the limit is equal to the infimum. 

A sequence is said to be monotone if it is either increasing or decreasing. From 
Proposition 6.3.8 and Corollary 6.1.17 we see that a monotone sequence converges 
if and only if it is bounded. 


Example 6.3.9 The sequence 3, 3.1, 3.14, 3.141, 3.1415, ... is increasing, and is 
bounded above by 4. Hence by Proposition 6.3.8 it must have a limit, which is a real 
number less than or equal to 4. 


Proposition 6.3.8 asserts that the limit of a monotone sequence exists, but does 
not directly say what that limit is. Nevertheless, with a little extra work one can often 
find the limit once one is given that the limit does exist. For instance: 


Proposition 6.3.10 Let 0 < x < 1. Then we have limy-+ x" = 0. 


Proof Since 0 < x <1, one can show that the sequence (x”)°°, is decreasing 


(why?). On the other hand, the sequence (x")°° , has a lower bound of 0. Thus by 
Proposition 6.3.8 (for infima instead of suprema) the sequence (x”)°° , converges to 
some limit L. Since x”+! = x x x”, we thus see from the limit laws (Theorem 6.1.19) 
that (x"*')°° , converges to xL. But the sequence (x"*!)°°, is just the sequence 
(x")°, shifted by one, and so they must have the same limits (why?). SoxL = L. 


Since x € 1, we can solve for L to obtain L = 0. Thus (x")°° , converges to 0. 


Note that this proof does not work when x > | (Exercise 6.3.4). 


— Exercises — 
Exercise 6.3.1 Verify the claim in Example 6.3.4. 
Exercise 6.3.2 Prove Proposition 6.3.6. (Hint: use Theorem 6.2.11.) 


Exercise 6.3.3 Prove Proposition 6.3.8. (Hint: use Proposition 6.3.6, together with the assumption 


that a, is increasing, to show that a, converges to sup(ay,)°°.,,-) 
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Exercise 6.3.4. Explain why Proposition 6.3.10 fails when x > 1. In fact, show that the sequence 
(x")°°_, diverges when x > 1. (Hint: prove by contradiction and use the identity (1/x)"x”" = 1 and 
the limit laws in Theorem 6.1.19.) Compare this with the argument in Example 1.2.3; can you now 
explain the flaws in the reasoning in that example? 


6.4 Limsup, Liminf, and Limit Points 


Consider the sequence 
1.1, —1.01, 1.001, —1.0001, 1.00001, .... 


If one plots this sequence, then one sees (informally, of course) that this sequence 
does not converge; half the time the sequence is getting close to 1, and half the time 
the sequence is getting close to —1, but it is not converging to either of them; for 
instance, it never gets eventually 1/2-close to 1, and never gets eventually 1/2-close 
to —1. However, even though —1 and +1 are not quite limits of this sequence, it does 
seem that in some vague way they “want” to be limits. To make this notion precise 
we introduce the notion of a limit point. 


Definition 6.4.1 (Limit points). Let (a,)°~_,,, be a sequence of real numbers, let x be 
areal number, and let ¢ > 0 be a real number. We say that x is e-adherent to (ay,)?~_,, 
iff there exists an n > m such that a, is e-close to x. We say that x is continually 
é-adherent to (ay) 7~., iff it is e-adherent to (a, )°° ,, for every N > m. We say that x 
is a limit point or adherent point of (a,)?—_,, iff itis continually e-adherent to (a, )°° 


na=m 
for every € > 0. 


Remark 6.4.2 The verb “to adhere” means much the same as “‘to stick to”; hence 
the term “adhesive”. 

Unwrapping all the definitions, we see that x is a limit point of (a,)°°_,, if, for 
every € > 0 and every N > m, there exists ann > N such that |a, — x| < e. (Why 
is this the same definition?) Note the difference between a sequence being &-close to 
L (which means that al/ the elements of the sequence stay within a distance ¢ of L) 
and L being ¢-adherent to the sequence (which only needs a single element of the 
sequence to stay within a distance e of L). Also, for L to be continually e-adherent to 
(Gn )pe-m> it has to be e-adherent to (a,)7°.y for all N > m, whereas for (a,)°~,,, to be 
eventually e-close to L, we only need (a,,)° ,, to be e-close to L for some N > m. 
Thus there are some subtle differences in quantifiers between limits and limit points. 

Note that limit points are only defined for finite real numbers. It is also possible to 
rigorously define the concept of +-oo or —oo being a limit point; see Exercise 6.4.8. 
Example 6.4.3 Let (a,)°2., denote the sequence 


n=1 


0.9, 0.99, 0.999, 0.9999, 0.99999, .... 
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The number 0.8 is 0.1-adherent to this sequence, since 0.8 is 0.1-close to 0.9, which 
is a member of this sequence. However, it is not continually 0.1-adherent to this 
sequence, since once one discards the first element of this sequence there is no 
member of the sequence to be 0.1-close to. In particular, 0.8 is not a limit point of 
this sequence. On the other hand, the number | is 0.1-adherent to this sequence, and 
in fact is continually 0. 1-adherent to this sequence, since no matter how many initial 
members of the sequence one discards, there is still something for | to be 0.1-close 
to. In fact, it is continually e-adherent for every ¢, and is hence a limit point of this 
sequence. 


Example 6.4.4 Now consider the sequence 
1.1, —1.01, 1.001, —1.0001, 1.00001, .... 


The number | is 0.1-adherent to this sequence; in fact it is continually 0. 1-adherent to 
this sequence, because no matter how many elements of the sequence one discards, 
there are some elements of the sequence that | is 0.1-close to. (As discussed earlier, 
one does not need al/ the elements to be 0.1-close to 1, just some; thus 0.1-adherent 
is weaker than 0.1-close, and continually 0.1-adherent is a different notion from 
eventually 0.1-close.) In fact, for every ¢ > O, the number | is continually e-adherent 
to this sequence and is thus a limit point of this sequence. Similarly, —1 is a limit 
point of this sequence; however 0 (say) is not a limit point of this sequence, since it 
is not continually 0.1-adherent to it. 


Limits are of course a special case of limit points: 


Proposition 6.4.5 (Limits are limit points). Let (a,)?°.,,, be a sequence which con- 
verges to a real number c. Then c is a limit point of (a,)°,,, and in fact it is the only 


n=m’? 
limit point of (an)?-_4- 


Proof See Exercise 6.4.1. 


Now we will look at two special types of limit points: the limit superior (lim sup) 
and limit inferior (lim inf). 


Definition 6.4.6 (Limit superior and limit inferior). Suppose that (a,)?°,,, iS a 
sequence. We define a new sequence (ay)%°_,,, by the formula 


+, 
dy ?= SUp(An)p—_y- 


More informally, a is the supremum of all the elements in the sequence from 
ay onwards. We then define the limit superior of the sequence (a,)°°_,,, denoted 
lim sup,,_,.45 Gn, by the formula 


. ooo + CO 
lim sup ad, := inf (ay) Vem: 
noo 


Similarly, we can define 
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dy ?= inf(an)p 


(oe) 
n=m? 


and define the limit inferior of the sequence (a,,) denoted lim inf, ..5 dn, by the 


formula 
lim inf a, := sup(a F 
Fi n pC N)N=m 


Example 6.4.7 Let a), az, a3, ... denote the sequence 
1.1, —1.01, 1.001, —1.0001, 1.00001, .... 
Then ay, Gs a; ... is the sequence 
1.1, 1.001, 1.001, 1.00001, 1.00001, ... 


(why?), and its infimum is 1. Hence the limit superior of this sequence is 1. Similarly, 
a, , 45,43 ,...1s the sequence 


—1.01, —1.01, —1.0001, —1.0001, —1.000001,... 


(why?), and the supremum of this sequence is —1. Hence the limit inferior of this 
sequence is —1. One should compare this with the supremum and infimum of the 
sequence, which are 1.1 and —1.01 respectively. 


Example 6.4.8 Let a, az, a3, ... denote the sequence 


1, —2,3, —4,5, —6,7, —8,... 
Then a}, a}, ... is the sequence 
+00, +00, +00, +00,... 


(why?) and so the limit superior is +00. Similarly, a; , a), ... 1s the sequence 


OO, —00, —00, —0O,... 


and so the limit inferior is —oo. 


Example 6.4.9 Let a), az, a3, ... denote the sequence 
1, -1/2, 1/3, -1/4, 1/5, -1/6,... 
Then a}, a}, ... is the sequence 


1, 1/3, 1/3, 1/5, 1/5, 1/7,... 
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which has an infimum of 0 (why?), so the limit superior is 0. Similarly, a; ,a,,... 
is the sequence 


1/2, -1/2, -1/4, -1/4, -1/6, -1/6, ... 


which has a supremum of 0. So the limit inferior is also 0. 


Example 6.4.10 Let a,, a2, a3, ... denote the sequence 


1,2,3,4,5,6,... 
Then ais ay ,... 18 the sequence 
+00, +00,+00,... 
so the limit superior is +-oo. Similarly, a; , a, ,... 1s the sequence 
1,2,3,4,5,... 


which has a supremum of +00. So the limit inferior is also +00. 


Remark 6.4.11 Some authors use the notation lim,_, 50d, instead of lim SUP, 500 An» 
and lim a, instead of liminf,...d,. Note that the starting index m of the 


N—> OO 


sequence is irrelevant (see Exercise 6.4.2). 


Returning to the piston analogy, imagine a piston at +-oo moving leftward until 
it is stopped by the presence of the sequence aj, a2, .... The place it will stop is the 
supremum of a1, a2, 43, ..., which in our new notation is ai. Now let us remove the 
first element a, from the sequence; this may cause our piston to slip leftward, to a 
new point aj (though in many cases the piston will not move and a} will just be 
the same as a;'). Then we remove the second element a, causing the piston to slip a 
little more. If we keep doing this the piston will keep slipping, but there will be some 
point where it cannot go any further, and this is the limit superior of the sequence. A 
similar analogy can describe the limit inferior of the sequence. 

We now describe some basic properties of limit superior and limit inferior. 


Proposition 6.4.12 Let (a,)°°,, be a sequence of real numbers, let L* be the limit 
superior of this sequence, and let L~ be the limit inferior of this sequence (thus both 
L* and L~ are extended real numbers). 


(a) For every x > Lt, there exists an N > m such that a, < x for alln => N. (In 
other words, for every x > L*, the elements of the sequence (dn)°24, are even- 
tually less than x.) Similarly, for every y < L~ there exists an N > m such that 
an > yforalln>N. 

(b) For every x < L*, and every N > m, there exists ann = N such that ay > x. 
(In other words, for every x < L*, the elements of the sequence (dy), exceed 
x infinitely often.) Similarly, for every y > L~ and every N > m, there exists an 
n> N such that a, < y. 
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(c) We have inf (a,)~.,, < L7 < Lt < sup(an)&.,,,- 

(d) If c is any limit point of (an)°°,, then we have L~ <c < Lt. 

(e) If L* is finite, then it is a limit point of (dy)°2.,,- Similarly, if L~ is finite, then it 
is a limit point of (An) 7— m- 

(f) Letc bea real number. If (an)°2_, converges to c, then we must have Lt = L7 = 


c. Conversely, if Lt = L~ = c, then (an)°~,, converges to c. 


Proof We shall prove (a) and (b) and leave the remaining parts to the exercises. 
Suppose first that x > L*. Then by definition of L*, we have x > inf(ay)%_,,. By 
Proposition 6.3.6, there must then exist an integer N > m such that x > ay. By 
definition of Gigs this means that x > sup(a,)°°_,,. Thus by Proposition 6.3.6 again, 
we have x > a, for alln > N, as desired. This proves the first part of (a); the second 
part of (a) is proven similarly. 

Now we prove (b). Suppose that x < L*. Then we have x < inf(ay)_,,- If we 
fix any N > m, then by Proposition 6.3.6, we thus have x < aj. By definition of ay, 
this means that x < sup(a,)°°_,. By Proposition 6.3.6 again, there must thus exist 
n => N such that a, > x, as desired. This proves the first part of (b), the second part 
of (b) is proven similarly. 

The proofs of (c), (d), (e), (f) are left to Exercise 6.4.3. 


Parts (d) and (e) of Proposition 6.4.12 say, in particular, that Lt is the largest limit 
point of (a,)°°_,,, and L~ is the smallest limit point (provided that Lt and L~ are 
finite). Proposition 6.4.12 (f) then says that if Lt and L~ coincide (so there is only 
one limit point) and are finite, then the sequence in fact converges. This gives a way 
to test if a sequence converges: compute its limit superior and limit inferior, and see 
if they are equal. 


We now give a basic comparison property of limit superior and limit inferior. 


Lemma 6.4.13 (Comparison principle). Suppose that (a,)°~_,, and (by) p-.,,, are two 
sequences of real numbers such that a, < b, foralln => m. Then we have the inequal- 
ities 

SUP(An) pm s sup(bn) rm 

inf (Gn) pom < inf (da) 


na=m — n=m 


lim supa, < lim sup b, 
n—->oo n—->Oo 


lim inf a, < lim inf b, 
noo noo 


Proof See Exercise 6.4.4. 


Corollary 6.4.14 (Squeeze test). Let (ay,)-°,,, (On)ro gs ANd (Cn)v-,, be Sequences 
of real numbers such that 
an < by < Cn 


for all n => m. Suppose also that (ay)°2,, and (cn)°° 


n=m n=m 
limit L. Then (b,)°~,, is also convergent to L. 


both converge to the same 
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Proof See Exercise 6.4.5. 


Example 6.4.15 We already know (see Proposition 6.1.11) that limy_,.. 1/n = 0. 
By the limit laws (Theorem 6.1.19), this also implies that lim, 2/n = 0 and 


lim,—.o0 —2/n = 0. The squeeze test then shows that any sequence (b,,)°°_, for which 


—2/n <b, <2/n foralln > 1 


is convergent to 0. For instance, we can use this to show that the sequence (—1)”/n + 
1/n? converges to zero, or that 2~” converges to zero. Note one can use induction to 
show that 0 < 2~" < 1/n foralln > 1. 


Remark 6.4.16 The squeeze test, combined with the limit laws and the principle 
that monotone bounded sequences always have limits, allows one to compute a large 
number of limits. We give some examples in the next chapter. 


One commonly used consequence of the squeeze test is 


Corollary 6.4.17 (Zero test for sequences). Let (a, )P° yy be a sequence of real num- 
bers. Then the limit limy 50 A, exists and is equal to zero if and only if the limit 
limy-+oo |d,| exists and is equal to zero. 


Proof See Exercise 6.4.7. 


We close this section with the following improvement to Proposition 6.1.12. 


lo) 
n=1 


Theorem 6.4.18 (Completeness of the reals). A sequence (an) 
is a Cauchy sequence if and only if it is convergent. 


of real numbers 


Remark 6.4.19 Note that while this is very similar in spirit to Proposition 6.1.15, it 
is a bit more general, since Proposition 6.1.15 refers to Cauchy sequences of rationals 
instead of real numbers. 


Proof Proposition 6.1.12 already tells us that every convergent sequence is Cauchy, 
so it suffices to show that every Cauchy sequence is convergent. 

Let (a,)r°, be a Cauchy sequence. We know from Lemma 5.1.15 (or more 
precisely, from the extension of this lemma to the real numbers, which is proven 
in exactly the same fashion) that the sequence (a,)°°, is bounded; by Lemma 
6.4.13 (or Proposition 6.4.12(c)) this implies that L~ := lim inf, 5. ad, and L* := 
lim sup,,_, 55 Gn Of the sequence are both finite. To show that the sequence converges, 
it will suffice by Proposition 6.4.12(f) to show that L~ = L*. 

Now let ¢ > 0 be any real number. Since (a,)°° ; is a Cauchy sequence, it must 
be eventually ¢-steady, so in particular there exists an N > 1 such that the sequence 
(a, )°~ y 1s e-steady. In particular, we have ay — € < a, < ay + € foralln > N. By 
Proposition 6.3.6 (or Lemma 6.4.13) this implies that 


ay — € <inf(a,)P y < sup(Gn)py San te 
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and hence by the definition of L~ and L* (and Proposition 6.3.6 again) 
dy —€<L~ <L* <ayte. 


Thus we have 
O0<Lt—L™ <2e. 


But this is true for all ¢ > 0, and L* and L~ do not depend on ¢; so we must therefore 
have L+ = L~. (If L*+ > L~ then we could set ¢ := (L* — L~)/3 and obtain a 
contradiction.) By Proposition 6.4.12(f) we thus see that the sequence converges. 


Remark 6.4.20 In the language of metric spaces (see Chap. | of Analysis IT), The- 
orem 6.4.18 asserts that the real numbers are a complete metric space—that they do 
not contain “holes” the same way the rationals do. (Certainly the rationals have lots 
of Cauchy sequences which do not converge to other rationals; take for instance the 
sequence I, 1.4, 1.41, 1.414, 1.4142, ... which converges to the irrational /2.) This 
property is closely related to the least upper bound property (Theorem 5.5.9), and 
is one of the principal characteristics which make the real numbers superior to the 
rational numbers for the purposes of doing analysis (taking limits, taking derivatives 
and integrals, finding zeroes of functions, that kind of thing), as we shall see in later 
chapters. 


— Exercises — 
Exercise 6.4.1 Prove Proposition 6.4.5. 


Exercise 6.4.2 State and prove analogues of Exercises 6.1.3 and 6.1.4 for limit points, limit superior, 
and limit inferior. 


Exercise 6.4.3 Prove parts (c), (d), (e), (f) of Proposition 6.4.12. (Hint: you can use earlier parts 
of the proposition to prove later ones.) 


Exercise 6.4.4. Prove Lemma 6.4.13. 
Exercise 6.4.5 Use Lemma 6.4.13 to prove Corollary 6.4.14. 


Exercise 6.4.6 Give an example of two bounded sequences (an) Po. , and (by yea such that a, < by 
for all n > 1, but that sup(an)P° 1 K sup(bn)eo ,- Explain why this does not contradict Lemma 
6.4.13. 


Exercise 6.4.7 Prove Corollary 6.4.17. Is the corollary still true if we replace zero in the statement 
of this corollary by some other number? 


Exercise 6.4.8 Let us say that a sequence (a,)°°_y of real numbers has +00 as a limit point iff 
it has no finite upper bound, and that it has —oo as a limit point iff it has no finite lower bound. 
With this definition, show that lim sup, _,.,, dn is a limit point of (an) wp and furthermore that it is 
larger than all the other limit points of (a, )°°_,,; in other words, the limit superior is the largest limit 
point of a sequence. Similarly, show that the limit inferior is the smallest limit point of a sequence. 
(One can use Proposition 6.4.12 in the course of the proof.) 


Exercise 6.4.9 Using the definition in Exercise 6.4.8, construct a sequence (an) which has 
exactly three limit points, at —oo, 0, and +00. 
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Exercise 6.4.10 Let (ay bea y be asequence of real numbers, and let (bin )re_ wm be another sequence 
of real numbers such that each b,, is a limit point of (an )pr ny: Let c be a limit point of (bm) pe Mw 
Prove that c is also a limit point of (a,)°° ,. (In other words, limit points of limit points are 
themselves limit points of the original sequence.) 


6.5 Some Standard Limits 


Armed now with the limit laws and the squeeze test, we can now compute a large 
number of limits. 


A particularly simple limit is that of the constant sequence c, c, c, c, ...; we clearly 
have 
lim c=c 
n—-> oo 


for any constant c (why?). 
Also, in Proposition 6.1.11, we proved that lim)... 1/n = 0. This now implies 


Corollary 6.5.1 We have limp... 1/n'/* = 0 for every integer k > 1. 


Proof From Lemma 5.6.6 we know that 1/n!/* is a decreasing function of n, while 
being bounded below by 0. By Proposition 6.3.8 (for decreasing sequences instead 
of increasing sequences) we thus know that this sequence converges to some limit 
L>0: 
L = lim 1/n!/*, 
noo 

Raising this to the kth power and using the limit laws (or more precisely, Theorem 
6.1.19(b) and induction), we obtain 


L* = lim 1/n. 


n—-oo 


By Proposition 6.1.11 we thus have L* = 0; but this means that L cannot be positive 
(else L‘ would be positive), so L = 0, and we are done. 


Some other basic limits: 


Lemma 6.5.2. Let x be a real number. Then the limit limp. x" exists and is equal 
to zero when |x| < 1, exists and is equal to 1 when x = 1, and diverges whenx = —1 
or when |x| > 1. 


Proof See Exercise 6.5.2. 


Lemma 6.5.3 For any x > 0, we have limy +0 xl/n = |, 


Proof See Exercise 6.5.3. 
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We will derive a few more standard limits later on, once we develop the root and 
ratio tests for series and for sequences. 


— Exercises — 


Exercise 6.5.1 Show that limy_.o0 1/n4? = 0 for any rational g > 0. (Hint: use Corollary 6.5.1 and 
the limit laws, Theorem 6.1.19.) Conclude that the limit limp. oo n4 does not exist. (Hint: argue by 
contradiction using Theorem 6.1.19(e).) 


Exercise 6.5.2. Prove Lemma 6.5.2. (Hint: use Proposition 6.3.10, Exercise 6.3.4, and the squeeze 
test.) 


Exercise 6.5.3 Prove Lemma 6.5.3. (Hint: you may need to treat the cases x > 1 and x < 1| sepa- 
rately. You might wish to first use Lemma 6.5.2 to prove the preliminary result that for every ¢ > 0 
and every real number M > 0, there exists an n such that M Vn p+ é.) 


6.6 Subsequences 


This chapter has been devoted to the study of sequences (a,)°°.,,, of real numbers, 


and their limits. Some sequences were convergent to a single limit, while others had 
multiple limit points. For instance, the sequence 


1.1, 0.1, 1.01, 0.01, 1.001, 0.001, 1.0001, ... 


has two limit points at 0 and 1 (which are incidentally also the lim inf and lim sup 
respectively), but is not actually convergent (since the lim sup and lim inf are not 
equal). However, while this sequence is not convergent, it does appear to contain 
convergent components; it seems to be a mixture of two convergent subsequences, 
namely 

1.1, 1.01, 1.001,... 


and 
0.1, 0.01, 0.001, .... 


To make this notion more precise, we need a notion of subsequence. 


Definition 6.6.1 (Subsequences). Let (an)°°.9 and (by, ) °° 9 be sequences of real num- 
bers. We say that (b,)729 is a subsequence of (an)p2o iff there exists a function 
ff: N— N which is strictly increasing (.e., f(n + 1) > f(m) for all n € N) such 
that 

by = Gfqn) for alln EN. 


More generally, we say that (b,)°<.,,,, is a subsequence of (a,)°°_,, if there exists a 


strictly increasing function f: {n € N:n > m'} > {n €N:n > m}suchthatb, = 
afin) for alln € N with n > m’. 
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Example 6.6.2 If (an); is a sequence, then (a2, )7°.9 is a subsequence of (an)7°9, 
since the function f: N — N defined by f() := 2n isa strictly increasing function 
from N to N. Note that we do not assume f/f to be obijective, although it is necessarily 
injective (why?). More informally, the sequence 


do, 42,44, d6,... 


is a subsequence of 
do, G1, 42, 3, 44,.... 


Example 6.6.3. The two sequences 
1.1, 1.01, 1.001,... 


and 
0.1,0.01,0.001,... 


mentioned earlier are both subsequences of 
1.1, 0.1, 1.01, 0.01, 1.001, 0.001, 1.0001, ... 


The property of being a subsequence is reflexive and transitive, though not sym- 
metric: 


Lemma 6.6.4 Let (dn); (bn) P29; and (Cn) 2.9 be sequences of real numbers. Then 
(an) 2.9 is a subsequence of (dn)°29. Furthermore, if (bn)?° is a subsequence of 
(dn )e29, and (Cn)p2.9 is a subsequence of (by )roo, then (cn)P-.9 is a subsequence of 
(an)-o 


Proof See Exercise 6.6.1. 


We now relate the concept of subsequences to the concept of limits and limit 
points. 


Proposition 6.6.5 (Subsequences related to limits). Let (a,)°°) be a sequence of 
real numbers, and let L be a real number. Then the following two statements are 
logically equivalent (each one implies the other): 


(a) The sequence (ay,)°-) converges to L. 
(b) Every subsequence of (a,)°°_, converges to L. 


Proof See Exercise 6.6.4. 


Proposition 6.6.6 (Subsequences related to limit points). Let (d,)°° . be a sequence 
of real numbers, and let L be a real number. Then the following two statements are 
logically equivalent. 


(a) L is a limit point of (a,)?-o. 
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(b) There exists a subsequence of (an)°-.) which converges to L. 


Proof See Exercise 6.6.5. 


Remark 6.6.7 The above two propositions give a sharp contrast between the notion 
of a limit and that of a limit point. When a sequence has a limit L, then all subse- 
quences also converge to L. But when a sequence has L as a limit point, then only 
some subsequences converge to L. 


We can now prove an important theorem in real analysis, due to Bernard Bolzano 
(1781-1848) and Karl Weierstrass (1815-1897): every bounded sequence has a con- 
vergent subsequence. 


Theorem 6.6.8 (Bolzano—Weierstrass theorem) Let (ay)°°_) be a bounded sequence 
(i.e., there exists a real number M > 0 such that |a,| < M foralln € N). Then there 
is at least one subsequence of (ay)p°.9 which converges. 


Proof Let L be the limit superior of the sequence (a,)°2. Since we have —M < 
an < M for all natural numbers n, it follows from the comparison principle (Lemma 
6.4.13) that —M < L < M. In particular, L is a real number (not +00 or —oo). By 
Proposition 6.4.12(e), L is thus a limit point of (a,)°2.). Thus by Proposition 6.6.6, 
there exists a subsequence of (a,,)°°.) which converges (in fact, it converges to L). 


Note that we could as well have used the limit inferior instead of the limit superior 
in the above argument. 


Remark 6.6.9 The Bolzano—Weierstrass theorem says that if a sequence is bounded, 
then eventually it has no choice but to converge in some places; it has “no room” to 
spread out and stop itself from acquiring limit points. It is not true for unbounded 
sequences; for instance, the sequence 1, 2,3,... has no convergent subsequences 
whatsoever (why?). In the language of topology, this means that the interval {x € 
R:—M <x < M} is compact, whereas an unbounded set such as the real line R 
is not compact. The distinction between compact sets and non-compact sets will be 
very important in later chapters - of similar importance to the distinction between 
finite sets and infinite sets. 


— Exercises — 


Exercise 6.6.1 Prove Lemma 6.6.4. 


Exercise 6.6.2 Can you find two sequences (an) 9 and (by, 9 which are not the same sequence, 
but such that each is a subsequence of the other? 


Exercise 6.6.3 (For this exercise you may assume the well-ordering principle, Proposition 8.1.4.) 
Let (dn)P°.9 be a sequence which is not bounded. Show that there exists a subsequence (bn )?° ¢ 
of (an) 9 such that lim,-,o9 1/b, exists and is equal to zero. (Hint: for each natural number j, 
recursively introduce the quantity n; := min{n € N: |a,| => j; n > nj—1} (omitting the condition 
n >nj—, when j = 0), first explaining why the set {n € N: |a,| = j;n > nj—1} is non-empty. 
Then set b; := ay,. To ensure the existence and uniqueness of the minimum, one either needs to 
invoke the well-ordering principle (which we have placed in Proposition 8.1.4, but whose proof 
does not rely on any material not already presented), or the least upper bound principle (Theorem 


5:5.9).) 
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Exercise 6.6.4 Prove Proposition 6.6.5. (Note that one of the two implications has a very short 
proof.) 


Exercise 6.6.5 Prove Proposition 6.6.6. (Hint: to show that (a) implies (b), define the numbers n ; for 
each natural numbers j by the formulan; := min{n > nj—-1 : |dy — L| < 1/7}, with the convention 
no := 0, explaining why the set {n > nj—1 : |a, — L| < 1/j} is non-empty. Then consider the 
sequence dn; ) 


6.7 Real Exponentiation, Part II 


We finally return to the topic of exponentiation of real numbers that we started in 
Sec. 5.6. In that section we defined x? for all rational g and positive real numbers x, 
but we have not yet defined x“ when a@ is real. We now rectify this situation using 
limits (in a similar way as to how we defined all the other standard operations on the 
real numbers). First, we need a lemma: 


Lemma 6.7.1 [Continuity of exponentiation] Let x > 0, and let a be a real number. 
Let (Gn)p2., be any sequence of rational numbers converging to a. Then (x4")°°_, is 
also a convergent sequence. Furthermore, if (qj,)7°_, is any other sequence of rational 
numbers converging to a, then (In), has the same limit as (x4")°°_): 


lim x = lim x‘, 
n—>oo n—->oo 
Proof There are three cases: x < 1,x = 1,andx > 1. The case x = 1 is rather easy 
(because then x? = | for all rational g). We shall just do the case x > 1, and leave 
the case x < 1 (which is very similar) to the reader. 
Let us first prove that (x%)°° , converges. By Proposition 6.4.18 it is enough to 
show that (x%)°° | is a Cauchy sequence. 
To do this, we need to estimate the distance between x and x%"; let us say for 


the time being that g, > gm, so that x > x” (since x > 1). We have 


d(x” xin) — xm _ xan — xin (x40 an _ 1) 
’ “ 


Since (q,)°2, is a convergent sequence, it has some upper bound M; since x > 1, 
we have x4” < x™. Thus 


d(x”, x4) = |x _ x4n| < x (Anam — 1). 


Now let ¢ > 0. We know by Lemma 6.5.3 that the sequence (x!/ es is eventually 
ex—™-close to 1. Thus there exists some K > 1 such that 


jae | ex. 
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Now since (q,)°", is convergent, it is a Cauchy sequence, and so there is an N > 1 
such that g, and q,, are 1/K-close for alln, m > N. Thus we have 


d(x, x9) <M (Gn 4m 1) < My V/K 1) < Mex = 


for every n,m > N such that g, > gdm. By symmetry we also have this bound when 
n,m > N and qn < qm. Thus the sequence (x“")°° ,, is e-steady. Thus the sequence 
(x™)r | is eventually e-steady for every ¢ > O, and is thus a Cauchy sequence as 
desired. This proves the convergence of (x#")°< |. 


Now we prove the second claim. It will suffice to show that 


lim x®—® = 1, 
noo 


since the claim would then follow from limit laws (since x# = x%~4nx%), 

Write rn ‘= dn — g;,; by limit laws we know that (7,)°° , converges to 0. We have 
to show that for every ¢ > 0, the sequence (x’")°° , is eventually e-close to 1. But 
from Lemma 6.5.3 we know that the sequence (x!/ a is eventually e-close to 1. 
Since limy_o x //* is also equal to | by Lemma 6.5.3, we know that ie Te, is 
also eventually e-close to 1. Thus we can find a K such that x!/* and x~!/* are both 
é-close to 1. But since (r,)°°., is convergent to 0, it is eventually 1/K-close to 0, so 
that eventually —1/K <r, < 1/K, and thus x7!/* < x" < x!/*_ In particular x” 
is also eventually ¢-close to | (see Proposition 4.3.7(f)), as desired. 


We may now make the following definition. 


Definition 6.7.2 (Exponentiation to a real exponent). Let x > 0 be real, and let a 
be a real number. We define the quantity x® by the formula x® = limy_,.. x”, where 


(Gn)e2, is any sequence of rational numbers converging to a. 


Let us check that this definition is well-defined. First of all, given any real number 
a we always have at least one sequence (q,)7~, of rational numbers converging to 
a, by the definition of real numbers (and Proposition 6.1.15). Secondly, given any 
such sequence (q,)°°.,, the limit lim, x exists by Lemma 6.7.1. Finally, even 
though there can be multiple choices for the sequence (q,)°° ,, they all give the same 
limit by Lemma 6.7.1. Thus this definition is well-defined. 

If a is not just real but rational, i.e., @ = g for some rational q, then this defini- 
tion could in principle be inconsistent with our earlier definition of exponentiation in 
Section 5.6. But in this case @ is clearly the limit of the sequence (q)°°_,, so by defini- 
tion x® = limy+.. x4 = x’. Thus the new definition of exponentiation is consistent 
with the old one. 


Proposition 6.7.3 All the results of Lemma 5.6.9, which held for rational numbers 
q and r, continue to hold for real numbers q and r. 


Proof We demonstrate this for the identity x7*” = x7x" (ie., the first part of Lemma 
5.6.9(b)); the other parts are similar and are left to Exercise 6.7.1. The idea is to start 
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with Lemma 5.6.9 for rationals and then take limits to obtain the corresponding 
results for reals. 

Let g and r be real numbers. Then we can write g = lim, +. q, and r= 
limy—oo /n for some sequences (g,)°°, and (7,)°°, of rationals, by the definition 
of real numbers (and Proposition 6.1.15). Then by the limit laws, g + r is the limit 
of (Gn + Tn)p2- By definition of real exponentiation, we have 


xt" = Jim xt xd = Jim xs x” = lim x”, 
noo noo noo 


But by Lemma 5.6.9(b) (applied to rational exponents) we have x4 tT" = x@x", 
Thus by limit laws we have x7*" = xx", as desired. 


— Exercises — 


Exercise 6.7.1 Prove the remaining components of Proposition 6.7.3. 


Chapter 7 Mm) 
Series > 


updates 


Now that we have developed a reasonable theory of limits of sequences, we will use 
that theory to develop a theory of infinite series 


[oe] 


) An = Am + Ami + Gn42+.---- 


n=m 


But before we develop infinite series, we must first develop the theory of finite series. 


7.1 Finite Series 


Definition 7.1.1 (Finite series) Let m,n be integers, and let (a;)?_,, be a finite 
sequence of real numbers, assigning a real number a; to each integer i between 
m and n inclusive (i.e., m < i <n). Then we define the finite sum (or finite series) 
rm ai ‘by the recursive formula 


n 


) a; := 0 whenever n < m; 
i=m 


n+l n 
y GQ i= y a; | + 4n41 whenever n > m — 1. 


i=m i=m 


Thus for instance we have the identities 


m—2 m—1 m 
Ya; = 0; >> a; = 0; par 
i=m i=m i=m 
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m+1 m+2 
) ay = Ayn + Om+13 ) ay = An + Am+1 + Am+2 
i=m i=m 


n 
i=m 


(why?). Because of this, we sometimes express ),_,, a; less formally as 


n 
) Gi = Am + Ami +... + an. 


i=m 


Remark 7.1.2. The difference between “sum” and “series” is a subtle linguistic one. 
Strictly speaking, a series is an expression of the form }~7_,, aj; this series is math- 
ematically (but not semantically) equal to a real number, which is then the sum of 
that series. For instance, 1 + 2+3-+4+5 isa series, whose sum is 15; if one were 
to be very picky about semantics, one would not consider 15 a series and one would 
not consider 1 + 2+3-+4-+5asum, despite the two expressions having the same 
value. However, we will not be very careful about this distinction as it is purely 
linguistic and has no bearing on the mathematics; the expressions | + 2+3+4+5 
and 15 are the same number, and thus mathematically interchangeable, in the sense 
of the axiom of substitution (see Sect. A.7), even if they are not semantically inter- 
changeable. 


Remark 7.1.3 Note that the variable i (sometimes called the index of summation) 
is a bound variable (sometimes called a dummy variable); the expression )~7_,,, dj 
does not actually depend on any quantity named 7. In particular, one can replace the 
index of summation 7 with any other symbol, and obtain the same sum: 


n 


n 
) qQi= ) aj. 


i=m j=m 
We list some basic properties of summation below. 


Lemma 7.1.4 (a) Letm <n < p be integers, and let a; be a real number assigned 
to each integer m <i < p. Then we have 


n Pp Pp 
dat di a= dia 
i=m i=n+1 i=m 


(b) Letm <n be integers, k be another integer, and let a; be a real number assigned 
to each integer m <i <n. Then we have 


nt+k 


n 
) qi = ) aj-k- 


i=m j=m+k 
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(c) Letm <n be integers, and let a;, b; be real numbers assigned to each integer 
m <i <n. Then we have 


n 


YG +b) = Yea + 7 


i=m i=m i=m 


(d) Let m 
m<i 


n be integers, and let a; be a real number assigned to each integer 
n, and let c be another real number. Then we have 


n 


Ya) =e( Sa 


i=m 


(e) (Triangle inequality for finite series) Let m <n be integers, and let a; be a real 
number assigned to each integer m <i <n. Then we have 


(f) (Comparison test for finite series) Let m <n be integers, and let a;, bj be real 
numbers assigned to each integer m <i <n. Suppose that a; < b; for allm < 
i <n. Then we have 


Proof See Exercise 7.1.1. 


Remark 7.1.5 In the future we may omit some of the parentheses in series expres- 
sions, for instance we may write )~"_,,, (a; + b;) simply as }~"_,, ai + bj. This is 
reasonably safe from being mis-interpreted, because the alternative interpretation 
(om Gi) + b; does not make any sense (the index i in b; is meaningless outside of 


the summation, since i is only a dummy variable). 
One can use finite series to also define summations over finite sets: 


Definition 7.1.6 (Summations over finite sets) Let X be a finite set with n elements 
(where n € N), and let f: X — R bea function from X to the real numbers (i.e., f 
assigns a real number f(x) to each element x of X). Then we can define the finite 
sum >> cy f(x) as follows. We first select any bijection g from {i e N: 1 <i <n} 
to X; such a bijection exists since X is assumed to have n elements. We then define 


n 


Y>f@) =D f(g). 


xexX i=1 
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The same definition also permits us to define }°.-y f(x) when f is defined on a 
larger set Y than X. 


Example 7.1.7 Let X be the three-element set X := {a, b, c}, where a, b, c are dis- 
tinct objects, and let f: X — R be the function f(a) := 2, f(b) := 5, f(c) := —-1. 
In order to compute the sum )°,-y f(x), we select a bijection g: {1,2,3} > X, 
e.g., g(1) := a, g(2) := b, g(3) := c. We then have 


3 
fo) = Yo f(e@) = fa) + fb) + FO) =6. 


xeX i=1 


One could pick another bijection from {1, 2,3} to X, e.g., h(1) :=c, h(2) :=b, 
h(3) = a, but the end result is still the same: 


3 
Y- f@) = ofA) = FO+ FO + fa =6. 


xeX i=1 


To verify that this definition actually does give a single, well-defined value to 
dex f(x), one has to check that different bijections g from {i € N: | <i <n} to 
X give the same sum. In other words, we must prove 


Proposition 7.1.8 (Finite summations are well-defined) Let X be a finite set withn 
elements (where n € N), let f: X — R be a function, and let g: {i EN: 1 <i< 
n}—> X andh: {ie N:1 <i <n} —> X be bijections. Then we have 


Yo f(g@) = Yo FAW). 
i=l] i=1 


Remark 7.1.9 The issue is somewhat more complicated when summing over infinite 
sets; see Section 8.2. 


Proof We use induction on n; more precisely, we let P(n) be the assertion that 
“For any set X of n elements, any function f: X — R, and any two bijections g, 
h from {i €N: 1 <i <n} to X, we have )~_, f(g()) = _, fAW@)”. (More 
informally, P(n) is the assertion that Proposition 7.1.8 is true for that value of n.) 
We want to prove that P() is true for all natural numbers n. 

We first check the base case P(0). In this case 7, f(g(i)) and -°_, f(h(i)) 
both equal to 0, by definition of finite series, so we are done. 

Now suppose inductively that P(m) is true; we now prove that P(n + 1) is true. 
Thus, let X be a set with n + 1 elements, let f: X — R bea function, and let g and 
h be bijections from {i e N: 1 <i <n+ 1} to X. We have to prove that 


n+1 n+l 


Yo fe@) = Yo FAW). (7.1) 
i=1 i=1 
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Let x := g(n + 1); thus x is an element of X. By definition of finite series, we can 
expand the left-hand side of (7.1) as 


n+l 
>> f(g@) = (> ree) + f(). 


i=1 i=l 


Now let us look at the right-hand side of (7.1). Ideally we would like to have h(n + 1) 
also equal to x—this would allow us to use the inductive hypothesis P (7) much more 
easily—but we cannot assume this. However, since h is a bijection, we do know that 
there is some index j, with 1 < 7 <n-+ 1, for whichh(j) = x. We now use Lemma 
7.1.4 and the definition of finite series to write 


n+1 n+l 
Fhe) = = (>: func) + dS FAW) 
i=l i=j+l 
y= n+1 
-( F140) +faiy+{ Y> fa) 
i=l i=j+l 


j-l 
-(5 rise) + f(x) + e f(a +1) 


i=j 


We now define the function h: {ie N:1<i<n}— X — {x} by setting h(i) — 
h(i) wheni < j andh(i) := h(i + 1) wheni > j. We can thus write the right-hand 
side of (7.1) as 


j-l1 n 
-(¥ ri) + f(x)t+ 3S fa@)] = (> rie) + f(x) 
i=1 


i=1 i=j 


where we have used Lemma 7.1.4 once again. Thus to finish the proof of (7.1) we 
have to show that 


Yo f(g@) = YS FAW). (7.2) 
i=1 i=1 


But the function g (when restricted to {i e N: 1 <i <n}) is a bijection from {i € 
N:1<i<n}— X — {x} (why?). The function his alsoa bijection from {i e N: 
1<i<n}— X — {x} (why? cf. Lemma 3.6.9). Since X — {x} has n elements (by 
Lemma 3.6.9), the claim 7.2 then follows directly from the induction hypothesis 
P(n). 


Remark 7.1.10 Suppose that X is a set, that P(x) is a property pertaining to an 
element x of X, and f: {y € X : P(y) is true} — R is a function. Then we will 
often abbreviate 
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3 fx) 


xe{yeX:P(y) 1S true} 


as ext-Piw) is true f (x) or even as Pw is true f() when there is no chance 
of confusion. For instance, > ,cn.r<n<4 f(n) Or )0>—, <4 f (n) are both short-hand 
for en el2,3.4} f@ = f@2)+ fC) + f(). (This convention is currently limited to 
cases in which {y € X : P(y) is true} is finite, but in later sections we will also 
define sums over infinite sets, in which case this convention will also extend to such 
settings.) 


The following properties of summation on finite sets are fairly obvious but do 
require a rigorous proof: 


Proposition 7.1.11 (Basic properties of summation over finite sets) 
(a) If X is empty, and f: X — R is a function (i.e., f is the empty function), we 
have 
> f@) =0. 
xex 


(b) If X consists of a single element, X = {xo}, and f: X — R is a function, we 
have 


Yo f@) = f Go). 


xeX 


(c) (Substitution, part I) If X is a finite set, f: X — Risa function, and g: Y > X 


is a bijection, then 
Yd f@) = YE f(e)). 


xex yeY 


(d) (Substitution, part IT) Letn < m be integers, and let X be the set X := {ie Z: 
n <i < m}. Ifa; is a real number assigned to each integer i € X, then we have 


(e) Let X, Y be disjoint finite sets (so X NY = @), and f: X UY —> Risafunction. 
Then we have 
yy fO= (x ro) +[ 0 60) 
zeXUY xEX yeY 


(f) (Linearity, part I) Let X be a finite set, and let f: X — Rand g: X > R be 
functions. Then 


Yi (f@) +8@) =o f@)+ do ae. 


xeX xeX xeX 
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(g) (Linearity, part IT) Let X be a finite set, let f : X — R be a function, and let c 


be a real number. Then 
Y eel=ey FG). 


xeX xEeX 


(h) (Monotonicity) Let X be a finite set, and let f: X > R and g: X > R be 
functions such that f (x) < g(x) for all x € X. Then we have 


yy f= 2: 


xeX xeX 


(i) (Triangle inequality) Let X be a finite set, and let f : X —- R bea function, then 


IDS FOI Ss DOIF@L 


xeX xeX 


Proof See Exercise 7.1.2. 


Remark 7.1.12 The substitution rule in Proposition 7.1.11(c) can be thought of as 
making the substitution x := g(y) (hence the name). Note that the assumption that g 
is a bijection is essential; can you see why the rule may fail when g is not one-to-one 
or not onto? From Proposition 7.1.11(c) and (d) we see that 


m 


m 
> Ge > af @) 


i=n i=n 


for any bijection f from the set {i ¢ Z:n <i < m} to itself. Informally, this means 
that we can rearrange the elements of a finite sequence at will and still obtain the 
same value. 


Now we look at double finite series—finite series of finite series—and how they 
connect with Cartesian products. 


Lemma 7.1.13 Let X, Y be finite sets, and let f : X x Y — R be a function. Then 


|S fenl= s) 6a. 


xEeX \yey (x,y)EXxY 


Proof Let n be the number of elements in X. We will use induction on n (cf. Propo- 
sition 7.1.8); i.e., we let P(m) be the assertion that Lemma 7.1.13 is true for any set 
X with n elements, and any finite set Y and any function f: X x Y — R. We wish 
to prove P(n) for all natural numbers n. 

The base case P(0) is easy, following from Proposition 7.1.11(a) (why?). Now 
suppose that P(7) is true; we now show that P(n + 1) is true. Let X be a set with 
n+ 1 elements. In particular, by Lemma 3.6.9, we can write X = X’ U {xo}, where 
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xq is an element of X and X’' := X — {xo} has n elements. Then by Proposition 
7.1.11(e) we have 


YI few l= [>> do few] J+] 5 £0.» |; 
xeX \ yey xeXx’ \ yeY yeY 
by the induction hypothesis this is equal to 
Yo FO. W+ [DS FG. y) 
(x, y)EX’xY yeY 
By Proposition 7.1.11(c) this is equal to 
Yo fat] So fay) 
(x, y)EX’ XY (x, y)E{xo} xY 


By Proposition 7.1.11(e) this is equal to 


>> £G,y) 


(x, y)EXxY 


(why?) as desired. 


Corollary 7.1.14 (Fubini’s theorem for finite series) Let X, Y be finite sets, and let 
f: X x Y — R bea function. Then 


yi few S= YS se 


xeX \ yeY (x,y)EXxY 


= + 79 


(y,x)EeYxX 


is (x fs.) | 


yeY \xex 


Proof In light of Lemma 7.1.13, it suffices to show that 


S Jin= + Fey: 


(x, y)EXxY (y,x)EY xX 
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But this follows from Proposition 7.1.11(c) by applying the bijection h: Y x X > 
X x Y defined by h(y, x) := (x, y). (Why is this a bijection, and why does Propo- 
sition 7.1.11(c) give us what we want?) 


Remark 7.1.15 This should be contrasted with Example 1.2.5; thus we anticipate 
something interesting to happen when we move from finite sums to infinite sums. 
However, see Theorem 8.2.2. 


— Exercises — 


Exercise 7.1.1 Prove Lemma 7.1.4. (Hint: you will need to use induction, but the base case might 
not necessarily be at 0.) 


Exercise 7.1.2 Prove Proposition 7.1.11. (Hint: this is not as lengthy as it may first appear. It is 
largely a matter of choosing the right bijections to turn these sums over sets into finite series, and 
then applying Lemma 7.1.4.) 

Exercise 7.1.3 Form a definition for the finite products []j_, a; and [],.<y f(x). Which of the 
above results for finite series have analogues for finite products? (Note that it is dangerous to apply 
logarithms because some of the a; or f (x) could be zero or negative. Besides, we haven’t defined 
logarithms yet.) 


Exercise 7.1.4 Define the factorial function n! for natural numbers n by the recursive definition 
O! := Land (n+ 1)! :=n! x (n+ 1). If x and y are real numbers, prove the binomial formula 


n 


n! . j 
CO =v ea 
jay TI 


for all natural numbers n. (Hint: induct on n.) 


Exercise 7.1.5 Let X be a finite set, let m be an integer, and for each x € X let (a,(x))P2,,, be 


a=m 
a convergent sequence of real numbers. Show that the sequence (>. ex n(X))72, is convergent, 


and 
Jin, Yo anto = Din on. 


xEXx xeXx 
(Hint: induct on the cardinality of X, and use Theorem 6.1.19(a).) Thus we may always interchange 


finite sums with convergent limits. Things however get trickier with infinite sums; see Corollary 
8.2.11 of Analysis I. 


Exercise 7.1.6 Let I be a finite set, and for each i € /, let E; be a finite set. Suppose that the 
E; are pairwise disjoint, which means that E; 1 E; = @ whenever i, j € J are distinct. For each 


x € Ue, Ej, let f(x) be a real number. Show that relic, gE, £0) = Vier Vxer, f()- 


Exercise 7.1.7 Letn, m be natural numbers, and for each 1 < i < n let a; be a natural number with 
aj < m. Establish the identity 


n m 

dia = DUM si sna > j)). 
i=l j=l 

(Hint: apply Corollary 7.1.14 to compute a sum )¥_, )7"_; ¢i,; in two different ways, for a well 

chosen choice of summands c;,;.) Use of identities such as this is known as the double counting 

method, and is often useful in combinatorics. 
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7.2 Infinite Series 


We are now ready to sum infinite series. 


Definition 7.2.1 (Formal infinite series) A (formal) infinite series is any expression 


of the form 
lo, @) 
ean 


na=m 


where mm is an integer, and a, is a real number for any integer n > m. We sometimes 
write this series as 
Am + Am+1 + Am42+..-- 


At present, this series is only defined formally; we have not set this sum equal 
to any real number; the notation a,, + Gm41 + Gn42 +... 18 of course designed to 
look very suggestively like a sum, but is not actually a finite sum because of the 
“"..” symbol. To rigorously define what the series actually sums to, we need another 
definition. 


Definition 7.2.2 (Convergence of series) Let )-*~.,,, d, be a formal infinite series. 
For any integer N > m, we define the N“" partial sum Sy of this series to be Sy := 
7 dn; of course, Sy is areal number. If the sequence (Sv) %7_,, converges to some 
limit L as N — ov, then we say that the infinite series ee dy is convergent, and 
converges to L; we also write L = am dn, and say that L is the sum of the infinite 
series ae ay. If the partial sums Sy diverge, then we say that the infinite series 


ye, In is divergent, and we do not assign any real number value to that series. 


n=m 


Remark 7.2.3 Note that Proposition 6.1.7 shows that if a series converges, then it 
has a unique sum, so it is safe to talk about the sum L = )°>°,, dn of a convergent 
series. 


Example 7.2.4 Consider the formal infinite series 


fore) 
+2 To), rinny) zeae Fess 


n=1 
The partial sums can be verified to equal 


N 


Sy 2S =o 


n=1 


by an easy induction argument (or by Lemma 7.3.3); the sequence | — 2~" converges 
to 1 as N — ov, and hence we have 
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(oe) 


yo = 1, 


n=1 


In particular, this series is convergent. On the other hand, if we consider the series 


CO 
yi2r = 2' 4274234... 


n=1 
then the partial sums are 


N 
i= a 
n=1 


and this is easily shown to be an unbounded sequence, and hence divergent. Thus 


the series )°°° , 2” is divergent. 


Now we address the question of when a series converges. The following propo- 
sition shows that a series converges iff the “tail” of the sequence is eventually less 
than ¢ for any ¢ > 0: 

Proposition 7.2.5 Let )~-~_,, a, be a formal series of real numbers. Then ¥->-_,, An 
converges if and only if, for every real number ¢ > 0, there exists an integer N > m 


such that 
qd 


yh <eforallp,q=>N. 


n=p 


Proof See Exercise 7.2.2. 


This proposition, by itself, is not very handy, because it is not so easy to compute 
the partial sums )~4_ p 4n in practice. However, it has a number of useful corollaries. 
For instance. 

Corollary 7.2.6 (Zero test) Let p paar a, be a convergent series of real numbers. 
Then we must have limy-+ 9 Gy, = 0. To put this another way, if iMy—+o0 An is non-zero 


or divergent, then the series }---_,, dn is divergent. 


Proof See Exercise 7.2.3. 


Example 7.2.7 The sequence a, := 1 does not converge to0 asm — 00, so we know 
that par 1 is a divergent series. (Note however that 1, 1,1, 1,... is a convergent 
sequence; convergence of series is a different notion from convergence of sequences.) 
Similarly, the sequence a, := (—1)” diverges, and in particular does not converge to 


zero; thus the series ar (—1)” is also divergent. 


oe) : oo 
If a sequence (a,)?°_,,, does converge to zero, then the series }7~_,, dn may or may 


not be convergent; it depends on the series. For instance, we will soon see that the 
series )-° , 1/n is divergent despite the fact that 1/n converges to 0 as n — oo. 
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Definition 7.2.8 (Absolute convergence) Let ) Sie a, be a formal series of real 
numbers. We say that this series is absolutely convergent iff the series )-~.,,, |dn| is 


convergent. 


Proposition 7.2.9 (Absolute convergence test) Let )°°°_,, dy be a formal series of 


real numbers. If this series is absolutely convergent, then it is also convergent. Fur- 
thermore, in this case we have the triangle inequality 


a) io) 
So an = ~ |dn|. 


na=m n=m 


Proof See Exercise 7.2.4. 


Remark 7.2.10 The converse to this proposition is not true; there exist series which 
are convergent but not absolutely convergent. See Example 7.2.12. Series that are 
convergent but not absolutely convergent are also known as conditionally convergent 
series. 


Proposition 7.2.11 (Alternating series test) Let (a,)?°.,,, be a sequence of real num- 
bers which are non-negative and decreasing, thus a, > 0 and ay > An+1 for every 
n > m. Then the series ¥°~ ,,(—1)"dy is convergent if and only if the sequence ay 
converges to 0 asn — od. 


Proof From the zero test, we know that if pioae (—1)"a, is a convergent series, 
then the sequence ((—1)"a,)°°,, converges to 0, which implies that (a,)°°_,, also 
converges to 0, since (—1)”"a, and a, have the same distance from 0. 

Now suppose conversely that (a, )°°,, converges to 0. For each N > m, let Sy be 


the partial sum Sy := uae (—1)"a,; our job is to show that (Sy)S7_,,, converges. 
Observe that 


Sn+2 = Sy a (-1)\*t ans + (C1) ays 


= Sy + (-1)"*! (awa1 — an42). 


But by hypothesis, (ay; — ady+2) is non-negative. Thus we have Sy+2 > Sy when 
N is odd and Sy+2 < Sy if N is even. 

Now suppose that N is even. From the above discussion and induction we see 
that Sy+2% < Sy for all natural numbers k (why?). Also we have Sy+2e41 > Sn+1 = 
Sw — 4y41 (why?). Finally, we have Sy4o%41 = Sw+ox — An42K+1 < Sy+ox (why?). 
Thus we have 


Sn — Qn41 < Sw42e41 < Snarn < Sy 


for all k. In particular, we have 


Sy — avi < S, < Sy foralln > N 
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(why?). In particular, the sequence S,, is eventually ay+ -steady. But the sequence 
(ay) }—» converges to 0 as N — on, thus this implies that S,, is eventually e-steady 
for every ¢ > 0 (why?). Thus (S,,)°°_,, converges, and so the series pees (—1)"a, is 
convergent. 


Example 7.2.12 The sequence (1/n)°° , is non-negative, decreasing, and converges 
to zero. Thus )°~° ,(—1)"/n is convergent (but it is not absolutely convergent, 
because }°™, 1/n diverges, see Corollary 7.3.7). Thus lack of absolute convergence 
does not imply lack of convergence, even though absolute convergence implies con- 


vergence. 
Some basic identities concerning convergent series are collected below. 
Proposition 7.2.13 (Series laws) 


(a) If >>, An is a series of realnumbers converging tox, and ~~~, bn is a series of 
real numbers converging to y, then ae (dn + by) is also a convergent series, 
§ing lo y, n=m\@n n & » 
and converges to x + y. In particular, we have 


fore) fore) fone) 
Son a bn) = a a > be 


(b) If opin An is a series of real numbers converging to x, and c is a real number, 
then >, (Can) is also a convergent series, and converges to cx. In particular, 


we have 
0° io) 
: (Can) =c , an. 
n=m n=m 


(c) Let ane a, be Secs of ne, and let k > 0 be an integer. If one of 
the two series yy ym An aNd Yoh m4 An are convergent, then the other one is 
also, and we have the identity 


oe) m+k—1 oo 
) ay, = ) An + ) An. 
n=m n=m n=m+k 


(d) Let ~~, dn be a series of real numbers converging to x, and let k be an integer. 
Then Saaye An— also converges to x. 


Proof See Exercise 7.2.5. 


From Proposition 7.2.13(c) we see that the convergence of a series does not depend 
on the first few elements of the series (though of course those elements do influence 
which value the series converges to). Because of this, we will usually not pay much 
attention as to what the initial index m of the series is. 

There is one type of series, called telescoping series, which are easy to sum: 
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Lemma 7.2.14 (Telescoping series) Let (a,)~°9 be a sequence of real numbers 
which converge to 0, i.e., liMy—+o0 4, = 0. Then the series pee ee = An41) con- 
verges to ao. 


Proof See Exercise 7.2.6. 


— Exercises — 


Exercise 7.2.1 Is the series )°7°.,(—1)" convergent or divergent? Justify your answer. Can you 
now resolve the difficulty in Example 1.2.2? 


Exercise 7.2.2. Prove Proposition 7.2.5. (Hint: use Proposition 6.1.12 and Theorem 6.4.18.) 
Exercise 7.2.3 Use Proposition 7.2.5 to prove Corollary 7.2.6. 

Exercise 7.2.4. Prove Proposition 7.2.9. (Hint: use Proposition 7.2.5 and Proposition 7.1.4(e).) 
Exercise 7.2.5 Prove Proposition 7.2.13. (Hint: use Theorem 6.1.19.) 


Exercise 7.2.6 Prove Lemma 7.2.14. (Hint: First work out what the partial sums yA (an — An+41) 
should be, and prove your assertion using induction.) How does the proposition change if we assume 
that a, does not converge to zero, but instead converges to some other real number L? 


7.3 Sums of Non-negative Numbers 

Now we specialize the preceding discussion in order to consider sums >, Qn 
where all the terms a, are non-negative. This situation comes up, for instance, from 
the absolute convergence test, since the absolute value |a,,| of a real number a,, is 
always non-negative. Note that when all the terms in a series are non-negative, there 
is no distinction between convergence and absolute convergence. 

Suppose }°>°_,, dn is a series of non-negative numbers. Then the partial sums 
Sy i= eae dy are increasing, i.e., Sy; > Sy for all N > m (why?). From Propo- 
sition 6.3.8 and Corollary 6.1.17, we thus see that the sequence (Sy) v_,, is convergent 
if and only if it has an upper bound M. In other words, we have just shown 
Proposition 7.3.1 Let )°°-. a, be a formal series of non-negative real numbers. 


Aa=m 
Then this series is convergent if and only if there is a real number M such that 


N 
> a, < M for all integers N > m. 


A simple corollary of this is 


Corollary 7.3.2. (Comparison test) Let )--~_,,, dy and )->~., bn be two formal series 
of real numbers, and suppose that |a,| < b, for all n => m. Then if ee by, is 
convergent, then )-°-_, ay is absolutely convergent, and in fact 


n=m 
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CO CoO [o.@) 
Yo an] < Do lanl < Yo bn. 


na=m na=m na=m 


Proof See Exercise 7.3.1. 


We can also run the comparison test in the contrapositive: if we have |a,| < 
b,, for all n > m, and ye a, is not absolutely convergent, then _ b,, is not 


na=m 


convergent. (Why does this follow immediately from Corollary 7.3.2?) 
A useful series to use in the comparison test is the geometric series 


oe) 
>" 
n=0 
where x is some real number: 


Lemma 7.3.3 (Geometric series) Let x be a real number. If |x| = 1, then the series 
yy x” is divergent. If however |x| < 1, then the series is absolutely convergent 


and - 
yr = 1/0=2). 


n=0 


Proof See Exercise 7.3.2. 


We now give a useful criterion, known as the Cauchy criterion, to test whether a 
series of non-negative but decreasing terms is convergent. 


Proposition 7.3.4 (Cauchy criterion) Let (a,)°°., be a decreasing sequence of non- 
negative real numbers (so dy, > 0 and an+1 < a, for all n => 1). Then the series 
ye an is convergent if and only if the series 


00 
YS > akan = a, + 2a7 + 4a4 + B8ag+... 
k=0 


is convergent. 


Remark 7.3.5 An interesting feature of this criterion is that it only uses a small 
number of elements of the sequence a,, (namely, those elements whose index n is a 
power of 2, n = 2") in order to determine whether the whole series is convergent or 
not. 


Proof Let Sy := sae a, be the partial sums of )°°°, ay, and let Tx := ae Dank 
be the partial sums of )°?° 5 2* ay. In light of Proposition 7.3.1, our task is to show that 
the sequence (Sy)_, is bounded if and only if the sequence (Tx) ¥_, is bounded. 
To do this we need the following claim: 


Lemma 7.3.6 For any natural number K, we have S)x+1_, < Tr < 2S»x. 
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Proof We use induction on K. First we prove the claim when K = 0, i.e. 
S) < Tp < 28). 


This becomes 
a <a, < 2a, 


which is clearly true, since a, is non-negative. 
Now suppose the claim has been proven for K, and now we try to prove it for 
K+1: 
Soxv2_y S Tray < 2SpKu. 


Clearly we have 
Tr+1 = Tk + QRH ayes . 


Also, we have (using Lemma 7.1.4(a) and (f), and the hypothesis that the a, are 
decreasing) 


QK+1 K+ 
Sox+1 = Sox + ) An > Sox + ) Agx+1 = Sox + OF anes 
n=2k +1 n=2K +1 


and hence 
2Sor+ > 2Sox + ayes, 


Similarly we have 


Qkt2_] 


Sox+21 = Soxti_y + ) an 
n=2K+1 


QK+2_] 
< Soxsi_y + = K+! 
n=2Kk+1 
= SoKH_]1 + WH aks ‘ 
Combining these inequalities with the induction hypothesis 


Soxti_y S Tr < 2Syx 


we obtain 
Sorp_1 < T+ < 2SoxK+1 


as desired. This proves the claim. 
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From this claim we see that if (Sy )%_, is bounded, then (S)«)%_, is bounded, and 
hence (Tx )&_ is bounded. Conversely, if (Tx )¢_ is bounded, then the claim implies 
that $:x+1_; is bounded, i.e., there is an M such that Syx+:_; < M for all natural 
numbers K. But one can easily show (using induction) that 2*+! — 1 > K +1, and 
hence that Sx4, < M for all natural numbers K, hence (Sy )Sy_, is bounded. 


Corollary 7.3.7 Let q > 0 be a real number. Then the series ee 1/n‘ is conver- 
gent when q > 1 and divergent when q < 1. 


Proof The sequence (1/n‘%)°° | is non-negative and decreasing (by Lemma 5.6.9(d) 
and Lemma 6.7.3), and so the Cauchy criterion applies. Thus this series is convergent 
if and only if 


Ya 
k 
a” Qh 


is convergent. But by the laws of exponentiation (Lemma 5.6.9 and Lemma 6.7.3) 
we can rewrite this as the geometric series 


De 
k=0 


As mentioned earlier, the geometric series )~7~ x* converges if and only if |x| < 1. 
Thus the series }°°° , 1/n? will converge if and only if |2'~4| < 1, which happens 


if and only if g > 1 (why? Try proving it just using Lemma 5.6.9 and Lemma 6.7.3, 
and without using logarithms). 


In particular, the series }°°° , 1/n (also knownas the harmonic series) is divergent, 
as claimed earlier. However, the series )°~° , 1/ n? is convergent. 
Remark 7.3.8 The quantity }°°°, 1/n%, when it converges, is called ¢(q), the 
Riemann-zeta function of q. This function is very important in number theory, and in 
particular in the distribution of the primes; there is a very famous unsolved problem 
regarding this function, called the Riemann hypothesis, but to discuss it further is far 
beyond the scope of this text. I will mention however that there is a US$ 1 million 
prize—and instant fame among all mathematicians—attached to the solution to this 
problem. 
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— Exercises — 
Exercise 7.3.1 Use Proposition 7.3.1 to prove Corollary 7.3.2. 


Exercise 7.3.2. Prove Lemma 7.3.3. (Hint: for the first part, use the zero test. For the second part, 
first use induction to establish the geometric series formula 


N 
Dt = dee l=) 


n=0 


and then apply Lemma 6.5.2.) 


Exercise 7.3.3 Let )~>°. dn be an absolutely convergent series of real numbers such that )“7- 4 
|an| = 0. Show that a, = 0 for every natural number n. 


7.4 Rearrangement of Series 


One feature of finite sums is that no matter how one rearranges the terms in a sequence, 
the total sum is the same. For instance, 


a, +42 +43 +44+d5 =a4 +43 +45+4, +4). 


A more rigorous statement of this, involving bijections, has already appeared earlier, 
see Remark 7.1.12. 

One can ask whether the same thing is true for infinite series. If all the terms are 
non-negative, the answer is yes: 


Proposition 7.4.1 Let )~~° 4 dn be a convergent series of non-negative real numbers, 
and let f : N > N be a bijection. Then Ya fim) is also convergent, and has the 


same sum, 
oo 0° 
: a = , A f(m): 
m=0 


n=0 


Proof We introduce the partial sums Sy := ee a, and Ty :=  S arin). We 
know that the sequences (Sy )7_, and (T)97_) are increasing. Write L := sup(Sw)°° 9 
and L’ := sup(Tm)fy_-o- By Proposition 6.3.8 we know that L is finite, and in fact 
L = ~~ 9 dn; by Proposition 6.3.8 again we see that we will thus be done as soon 
as we can show that L’ = L. 

Fix M, and let Y be the set Y := {m € N: m < M}. Note that f is a bijection 
between Y and f(Y). By Proposition 7.1.11, we have 


M 
Tu = >) asim =) asim = Do an. 
m=0 


meY nef (Y) 
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The sequence (f(m))/_, is finite, hence bounded, i.e., there exists an N such that 
f(m) < N forallm < M. In particular f(Y) is a subset of {n € N:n < N}, andso 
by Proposition 7.1.11 again (and the assumption that all the a, are non-negative) 


N 
Tu = ) ans ) an = ) an = Sy. 
n=0 


née f(Y) née{nEN:n<N} 


But since (Sy)97_q has a supremum of L, we thus see that Sj < L, and hence that 
Tu < L for all M. Since L’ is the least upper bound of (Ti)%7_p, this implies that 
L' <L. 

A very similar argument (using the inverse f—! instead of f) shows that every 
Sy is bounded above by L’, and hence L < L’. Combining these two inequalities we 
obtain L = L’, as desired. 


Example 7.4.2. From Corollary 7.3.7 we know that the series 


oe) 
So 1/n? = 14 1/44 1/9 + 1/16 + 1/25 + 1/36 +++ 


n=1 


is convergent. Thus, if we interchange every pair of terms, to obtain 
1/4+1+1/16+4+ 1/9 + 1/364 1/25+.--- 


we know that this series is also convergent, and has the same sum. (It turns out that 
the value of this sum is ¢(2) = 2? /6, a fact which we shall prove in Exercise 5.5.2.) 


Now we ask what happens when the series is not non-negative. Then as long as the 
series is absolutely convergent, we can still do rearrangements: 


Proposition 7.4.3 (Rearrangement of series) Let }°~° 9 dn be an absolutely conver- 
gent series of real numbers, and let f : N > N be a bijection. Then Y->_9 aim) is 
also absolutely convergent, and has the same sum: 


oe) oo 
> ay, = > af (m)- 
m=0 


n=0 


Proof (Optional) We apply Proposition 7.4.1 to the infinite series )°°° 9 |an|, which 
by hypothesis is a convergent series of non-negative numbers. If we write L := 
ye 9 [dn|, then by Proposition 7.4.1 we know that }°>_, |a fon)| also converges to 
L. 

Now write L’ := )°>° 9 dn. We have to show that }°*°_9 aim) also converges to 
L’. In other words, given any ¢ > 0, we have to find an M such that psu —0 &f(m) Is 
é-close to L' for every M' > M. 
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Since Rae |a,| 1s convergent, we can use Proposition 7.2.5 and find an N; such 
that )4_, lan| < €/2 for all p,q > Ni. Since )°>2.9 dn converges to L’, the partial 
sums ys dy also converge to L’, and so there exists N > N; such that seme Ay IS 
é/2-close to L’. 

Now the sequence ( f~!(n)) - is finite, hence bounded, so there exists an M such 
that fol) < M for all0 <n < N. In particular, for any M’ > M, the set {f(m) : 
m €N;m < M’} contains {n € N: n < N} (why?). So by Proposition 7.1.11, for 
any M'> M 


M' N 
> ayn) = > On => an + > ay 
n=0 


m=0 né&{ f (m):meEN;m<M’} nex 
where X is the set 
X={f(m):meNim<M)}\{neN:in<N}. 


The set X is finite, and is therefore bounded by some natural number g; we must 
therefore have 
XC{neN:N4+1<nK<gq} 


(why?). Thus 
q 


<olanl< Do lanl < €/2 


nex n=N+1 


a 


nex 


by our choice of N. Thus pan a fim) 18 €/2-close to yr 6 d,, Which as mentioned 


before is ¢/2-close to L’. Thus pies 


inzo 2f(m) iS é-close to L’ for all M’ > M, as 
desired. 


Surprisingly, when the series is not absolutely convergent, then the rearrangements 
are very badly behaved. 


Example 7.4.4 Consider the series 
1/3-—1/4+1/5-—1/6+1/7-—1/8+.---. 


This series is not absolutely convergent (why?), but is convergent by the alternating 
series test, and in fact the sum can be seen to converge to a positive number (in 
fact, it converges to In(2) — 1/2 = 0.193147 ..., see Example 4.5.7). Basically, the 
reason why the sum is positive is because the quantities (1/3 — 1/4), (1/5 — 1/6), 
(1/7 — 1/8) are all positive, which can then be used to show that every partial sum 
is positive. (Why? you have to break into two cases, depending on whether there are 
an even or odd number of terms in the partial sum.) 

If, however, we rearrange the series to have two negative terms to each positive 
term, thus 
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1/3 — 1/4-—1/6+ 1/5 — 1/8 -— 1/10+ 1/7 —1/12—1/14+.--- 


then the partial sums quickly become negative (this is because (1/3 — 1/4 — 1/6), 
(1/5 — 1/8 — 1/9), and more generally (1/(2n + 1) — 1/4n — 1/(4n + 2)) are all 
negative), and so this series converges to a negative quantity; in fact, it converges to 


(In(2) — 1)/2 = —.153426.... 


There is in fact a surprising result of Riemann, which shows that a series which is 
conditionally convergent (that is, convergent but not absolutely convergent) can in 
fact be rearranged to converge to any value (or rearranged to diverge, in fact—see 
Exercise 8.2.6); see Theorem 8.2.8. 


To summarize, rearranging series is safe when the series is absolutely convergent, 
but is somewhat dangerous otherwise. (This is not to say that rearranging a series that 
is not absolutely convergent necessarily gives you the wrong answer—for instance, 
in theoretical physics one often performs similar maneuvres, and one still (usually) 
obtains a correct answer at the end—but doing so is risky, unless it is backed by a 
rigorous result such as Proposition 7.4.3.) 


— Exercises — 


Exercise 7.4.1 Let )~°. dn be an absolutely convergent series of real numbers. Let f: N > N 
be an increasing function (i.e., f(n + 1) > f(n) for all n € N). Show that boa a(n) 18 also an 
absolutely convergent series. (Hint: try to compare each partial sum of )7>° 9 a f(n) With a (slightly 
different) partial sum of )°°° 9 dn.) What happens if we assume f is merely one-to-one, rather than 
increasing? 


Exercise 7.4.2. Obtain an alternate proof of Proposition 7.4.3 using Proposition 7.4.1, Proposition 
7.2.13, and expressing a, as the difference of a, + |a,| and |a,|. (This argument is due to Will 
Ballard.) 


7.5 The Root and Ratio Tests 


Now we can state and prove the famous root and ratio tests for convergence. 


Theorem 7.5.1 (Root test) Let pare ay be a series of real numbers, and let a := 
lim sup, 4 |an|'/". 


(a) If « <1, then the series \~~_,, an is absolutely convergent (and hence conver- 
gent). 

(b) Ifa > 1, then the series woe dyn is not convergent (and hence cannot be abso- 
lutely convergent either). 

(c) Ifa = 1, we cannot assert any conclusion. 
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Proof By Proposition 7.2.13(c), we may assume without loss of generality that 
m > 1; in particular |a,|!/ ” is well-defined for any n > m. 

First suppose thata < 1. Note that we musthavea > 0, since |a,|'/" > Oforevery 
n. Then we can find an €é > 0 such that 0 < a + € < 1 (for instance, we can set € := 
(1 — a)/2). By Proposition 6.4.12(a), there exists an N > m such that lay |'/" <a + 
é for alln > N.In other words, we have |a,| < (a + €)” foralln > N. But from the 
geometric series we have that }°°°_y(@ + €)” is absolutely convergent, since 0 < a + 
é < | (note that the fact that we start from N is irrelevant by Proposition 7.2.13(c)). 
Thus by the comparison test, we see that )°”° ,, dy is absolutely convergent, and thus 
yn An is absolutely convergent, by Proposition 7.2.13(c) again. 

Now suppose that w > 1. Then by Proposition 6.4.12(b), we see that for every 
N > m there exists an n > N such that |a,|!/" > 1, and hence that |a,| > 1. In 
particular, (a,)°°_y is not 1-close to 0 for any N, and hence (a,,)°°_,, is not eventually 
1-close to 0. In particular, (a,,)°°.,, does not converge to zero. Thus by the zero test, 
yo nm An iS Not convergent. 

For a = 1, see Exercise 7.5.3. 


1/n 


n=m 


The root test is phrased using the limit superior, but of course if limp |an| I/n 


converges then the limit is the same as the limit superior. Thus one can phrase the 
root test using the limit instead of the limit superior, but only when the limit exists. 

The root test is sometimes difficult to use; however we can replace roots by ratios 
using the following lemma. 


Lemma 7.5.2 Let (c,)°-,,, be a sequence of positive numbers. Then we have 


se gp Ent Cn4+1 
lim inf < lim inf ae V/n aa 


nm>OO Cy noo 


< lim supc,/”" < lim sup 
n—>0o noo =Cy 


Proof There are three inequalities to prove here. The middle inequality follows from 
Proposition 6.4.12(c). We shall prove the last inequality, and leave the first one to 
Exercise 7.5.1. 

Write L := lim sup, _,,, +. If L = +00 then there is nothing to prove (since 
x < +00 for every extended real number x), so we may assume that L is a finite real 
number. (Note that LZ cannot equal —oo; why?). Since cut is always positive, we 
know that L > 0. " 

Lete > 0. By Proposition 6.4.12(a), we know that there exists an N > m such that 
cul + <L+e forall n > N. without loss of generality we may assume that N > 1. 
This implies that c,,); < c,(Z + €) for all n > N. By induction this implies that 


Cn < cy(L +e)" for alln > N 
(why?). If we write A := cy(L + se)" then we have 


Cn < A(L +6)" 
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and thus 
clin 2 AY" (Lae) 


for alln > N. But we have 
lim A’"(L+e)=L+e 
n—oo 


by the limit laws (Theorem 6.1.19) and Lemma 6.5.3. Thus by the comparison prin- 
ciple (Lemma 6.4.13) we have 


lim sup gue <L+e. 
nC 


But this is true for all ¢ > 0, so this must imply that 


A 1/n 
lim sup c,/" < L 
noo 


(why? prove by contradiction), as desired. 
From Theorem 7.5.1 and Lemma 7.5.2 (and Exercise 7.5.3) we have 


Corollary 7.5.3 (Ratio test) Let )~°-_, dy be a series of non-zero numbers. (The 


na=m 
non-zero hypothesis is required so that the ratios |ay+1|/|an| appearing below are 
well-defined.) 


(oe) 
n=m 


lan+tl 


e Jflim sup, . 0 F2> < 1, then the series > 
convergent). 

e Jf liminfy +o Sul > 1, then the series LSaee a, is not convergent (and thus 
cannot be absolutely convergent). 


da, is absolutely convergent (hence 


e In the remaining cases, we cannot assert any conclusion. 
Another consequence of Lemma 7.5.2 is the following limit: 
Proposition 7.5.4 We have lim,_,..n'/" = 1. 


Proof By Lemma 7.5.2 we have 


lim supn'/" < lim sup(n + 1)/n = lim sup 1 + 1/n = 1 
[o.e) 


n—>oco n->Oo n> 


by Proposition 6.1.11 and limit laws (Theorem 6.1.19). Similarly we have 


lim inf n'/" > liminf(n + 1)/n = liminf 1+ 1/n =1. 
CO n—->oo 


n—- oo n> 


The claim then follows from Proposition 6.4.12(c) and (f). 


Remark 7.5.5 In addition to the ratio and root tests, another very useful convergence 
test is the integral test, which we will cover in Proposition 11.6.4. 
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— Exercises — 
Exercise 7.5.1 Prove the first inequality in Lemma 7.5.2. 


Exercise 7.5.2 Let x be a real number with |x| < 1, and g be a real number. Show that the series 
ye , 17x" is absolutely convergent, and that limy—.o9 n7x" = 0. 
Exercise 7.5.3. Give an example of a divergent series )°7° | dy of positive numbers ay, such that 


. : 1 : : 
limn— oo €n41/dn = liMp—oo al! "= 1, and give an example of a convergent series ae 12n of 


positive numbers b, such that limy—oo bn+1/bn = limn—oo bil” = 1. (Hint: use Corollary 7.3.7.) 
This shows that the ratio and root tests can be inconclusive even when the summands are positive 


and all the limits converge. 


Chapter 8 M®) 
Infinite Sets Sail 


We now return to the study of set theory, and specifically to the study of cardinality 
of sets which are infinite (i.e., sets which do not have cardinality n for any natural 
number 71), a topic which was initiated in Sect. 3.6. 


8.1 Countability 


From Proposition 3.6.14c we know that if X is a finite set, and Y is a proper subset 
of X, then Y does not have equal cardinality with X. However, this is not the case 
for infinite sets. For instance, from Theorem 3.6.12 we know that the set N of natural 
numbers is infinite. The set N — {0} is also infinite, thanks to Proposition 3.6.14a 
(why?), and is a proper subset of N. However, the set N — {0}, despite being “smaller” 
than N, still has the same cardinality as N, because the function f: N — N — {0} 
defined by f(n) :=n+ 1, is a bijection from N to N — {0}. (Why?) This is one 
characteristic of infinite sets; see Exercise 8.1.1. 

We now distinguish two types of infinite sets: the countable sets and the uncount- 
able sets. 


Definition 8.1.1 (Countable sets) A set X is said to be countably infinite (or just 
countable) iff it has equal cardinality with the natural numbers N. A set X is said to 
be at most countable iff it is either countable or finite. We say that a set is uncountable 
if it is infinite but not countable. 


Remark 8.1.2 Countably infinite sets are also called denumerable sets. 


Examples 8.1.3 From the preceding discussion we see that N is countable, and so 
is N — {0}. Another example of a countable set is the even natural numbers {2n : 
n € N}, since the function f(n) := 2n provides a bijection between N and the even 
natural numbers (why?). 
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Let X be accountable set. Then, by definition, we know that there exists a bijection 
f:N-— X. Thus, every element of X can be written in the form f(v) for exactly 
one natural number n. Informally, we thus have 


X={f0), FW), FZ), FG), -- J 


Thus, a countable set can be arranged in a sequence, so that we have a zeroth element 

Ff (O), followed by a first element f (1), then a second element f (2), and so forth, in 

such a way that all these elements f(0), f(1), f(2), ... are all distinct, and together 

they fill out all of X. (This is why these sets are called countable; because we can 

literally count them one by one, starting from f (0), then f(1), and so forth.) 
Viewed in this way, it is clear why the natural numbers 


N= (6,1, 2,3)... 


the positive integers 
N — {0} = {1, 2,3,...}, 


and the even natural numbers 
{0, 2, 4, 6, 8,...} 
are countable. However, it is not as obvious whether the integers 
Z= {...,—3, —2, -1,0,1,2,3,...} 


or the rationals 


Q = {0, 1/4, -2/3,..} 


or the reals 
R = (0, V2, —x, 2.5,...} 


are countable or not; for instance, it is not yet clear whether we can arrange the 
real numbers in a sequence f (0), f(1), f(2),.... We will answer these questions 
shortly. 

From Proposition 3.6.4 and Theorem 3.6.12, we know that countable sets are 
infinite; however it is not so clear whether all infinite sets are countable. Again, we 
will answer those questions shortly. We first need the following important principle. 


Proposition 8.1.4 (Well-ordering principle) Let X be a non-empty subset of the 
natural numbers N. Then there exists exactly one element n € X such thatn < m for 
allm € X. In other words, every non-empty set of natural numbers has a minimum 
element. 


Proof See Exercise 8.1.2. 
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We will refer to the element n given by the well-ordering principle as the minimum 
of X, and write it as min(X ). Thus for instance the minimum of the set {2, 4, 6, 8, .. .} 
is 2. This minimum is clearly the same as the infimum of X, as defined in Definition 
5.5.10 (why?). 


Proposition 8.1.5 Let X be an infinite subset of the natural numbers N. Then there 
exists a unique bijection f : N — X whichis increasing, in the sense that f (n + 1) > 
f(r) for all n €N. In particular, X has equal cardinality with N and is hence 
countable. 


Proof We will give an incomplete sketch of the proof, with some gaps marked by a 
question mark (?); these gaps will be filled in Exercise 8.1.3. 
We now define a sequence do, a, a2, ... of natural numbers recursively by the 
formula 
ay = min{x € X:x Aa, forallm <n}. 


Intuitively speaking, ao is the smallest element of X; a, is the second smallest element 
of X, i.e., the smallest element of X once do is removed; az is the third smallest 
element of X; and so forth. Observe that in order to define a,, one only needs to 
know the values of a,, for all m <n, so this definition is recursive. Also, since X is 
infinite, the set {x € X : x #a,, for all m < n} is infinite(?), hence non-empty. Thus 
by the well-ordering principle, the minimum, min{x € X : x #4, for allm <n} is 
always well-defined. 
One can show(?) that a, is an increasing sequence, i.e., 


ag<da,<a<... 


and in particular that(?) a, 4 ad, for all n 4 m. Also, we have(?) a, € X for each 
natural number n. 

Now define the function f: N — X by f(n) := a,. From the previous paragraph 
we know that f is one-to-one. Now we show that f is onto. In other words, we claim 
that for every x € X, there exists ann such that a, = x. 

Let x € X. Suppose for sake of contradiction that a, 4 x for every natural number 
n. Then this implies(?) that x is an element of the set {x € X : x Aa, for all m <n} 
for all n. By definition of a,, this implies that x > a, for every natural number n. 
However, since a, is an increasing sequence, we havea, > n(?), andhence x > n for 
every natural number n. In particular we have x > x + 1, which is a contradiction. 
Thus we must have a, = x for some natural number n, and hence / is onto. 

Since f: N — X is both one-to-one and onto, it is a bijection. We have thus 
found at least one increasing bijection f from N to X. Now suppose for sake of 
contradiction that there was at least one other increasing bijection g from N to X 
which was not equal to f. Then the set {n € N: g(n) # f(n)} is non-empty, and 
define m := min{n € N: g(n) € f(n)}, thus in particular g(m) A f(m) = a,,, and 
g(n) = f(n) = a, for alln < m. But we then must have(?) 


g(m) = min{x € X:x <a, forallt <m}=ady, 
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a contradiction. Thus there is no other increasing bijection from N to X other than 
f. O 


Since finite sets are at most countable by definition, we thus have 


Corollary 8.1.6 All subsets of the natural numbers are at most countable. 


Corollary 8.1.7 If X is an at most countable set, and Y is a subset of X, then Y is 
at most countable. 


Proof If X is finite then this follows from Proposition 3.6.14c, so assume X is 
countable. Then there is a bijection f: X — N between X and N. Since Y is a 
subset of X, and f is a bijection from X and N, then when we restrict f to Y, we 
obtain a bijection between Y and f(Y). (Why is this a bijection?) Thus f(Y) has 
equal cardinality with Y. But f(Y) is a subset of N, and hence at most countable by 
Corollary 8.1.6. Hence Y is also at most countable. 


Proposition 8.1.8 Let Y bea set, and let f : N — Y bea function. Then f (N) is at 
most countable. 


Proof See Exercise 8.1.4. 


Corollary 8.1.9 Let X be a countable set, and let f : X — Y be a function. Then 
Ff (X) is at most countable. 


Proof See Exercise 8.1.5. 


Proposition 8.1.10 Let X be a countable set, and let Y be a countable set. Then 
X UY is acountable set. 


Proof See Exercise 8.1.7. 


To summarize, any subset or image of a countable set is at most countable, and 
any finite union of countable sets is still countable. We can now establish countability 
of the integers. 


Corollary 8.1.11 The integers Z are countable. 


Proof We already know that the set N = {0, 1, 2,3,...} of natural numbers are 
countable. The set —N defined by 


—N:= {-n:n €N}= {0, —1, —2, —3,...} 
is also countable, since the map f(n) := —n is a bijection between N and this set. 


Since the integers are the union of N and —N, the claim follows from Proposition 
8.1.10 


To establish countability of the rationals, we need to relate countability with 
Cartesian products. In particular, we need to show that the set N x N is countable. 
We first need a preliminary lemma: 
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Lemma 8.1.12 The set 
A:={(n,m)ENxN:0<m <n} 


is countable. 


Proof Define the sequence ao, a, a2, ... recursively by setting dp := 0, and ay4) := 
ad, +n-+ 1 for all natural numbers n. Thus 


a9 =0;aq, =04+ 15a =04+14+2;a, =04+14+2+4+3;.... 


By induction one can show that a, is increasing, i.e., that a, > a, whenever n > m 
(why?). 
Now define the function f: A > N by 


f(a,m) :=a,+m. 


We claim that f is one-to-one. In other words, if (n,m) and (n’, m’) are any two 
distinct elements of A, then we claim that f(n,m) 4 f(n’',m’). 

To prove this claim, let (7, m) and (n’, m’) be two distinct elements of A. There 
are three cases: n’ = n,n' > n,andn’ < n. First suppose that n’ = n. Then we must 
have m # m’, otherwise (n, m) and (n’, m’) would not be distinct. Thus a, +m 4 
a, +m’, and hence f (n,m) # f(n',m’), as desired. 

Now suppose that n’ > n. Then n’ > n + 1, and hence 


f(n',m’) = dy +m’ 2 ay = An41 =A tnt 1. 
But since (n,m) € A, we have m <n <n-+ 1, and hence 
f(n',m') >a, +n+ 1 >a, +m= f(,m), 


and thus f(n’,m') 4 f(n,m). 

The case n’ <n is proven similarly, by switching the rdles of n and n’ in the 
previous argument. Thus we have shown that f is one-to-one. Thus f is a bijection 
from A to f(A), and so A has equal cardinality with f(A). But f(A) is a subset of 
N, and hence by Corollary 8.1.6 f(A) is at most countable. Therefore A is at most 
countable. But, A is clearly not finite. (Why? Hint: if A was finite, then every subset 
of A would be finite, and in particular {(m, 0) : n € N} would be finite, but this is 
clearly countably infinite, a contradiction.) Thus, A must be countable. 


Corollary 8.1.13 The set N x N is countable. 


Proof We already know that the set 


A:={(n,m)ENxXN:0<m <n} 
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is countable. This implies that the set 

B:={(n,m)—€NxN:0<1n <m} 
is also countable, since the map f: A > B givenby f(n, m) := (m,n) isabijection 


from A to B (why?). But since N x N is the union of A and B (why?), the claim 
then follows from Proposition 8.1.10. 


Corollary 8.1.14 Jf X and Y are countable, then X x Y is countable. 


Proof See Exercise 8.1.8. 


Corollary 8.1.15 The rationals Q are countable. 


Proof We already know that the integers Z are countable, which implies that the 
non-zero integers Z — {0} are countable (why?). By Corollary 8.1.14, the set 


Z x (Z— {0}) = {(, b):a,b€Z,b £0} 


is thus countable. If one lets f: Z x (Z— {0}) — Q be the function f(a, b) := 
a/b (note that f is well-defined since we prohibit b from being equal to 0), we 
see from Corollary 8.1.9 that f(Z x (Z— {0})) is at most countable. But we have 
f(Z x (Z— {0})) = Q (why? This is basically the definition of the rationals Q). 
Thus Q is at most countable. However, Q cannot be finite, since it contains the 
infinite set N. Thus Q is countable. 


Remark 8.1.16 Because the rationals are countable, we know in principle that it is 
possible to arrange the rational numbers as a sequence: 


Q = {do, 41, do, a3, a 


such that every element of the sequence is different from every other element, and 
that the elements of the sequence exhaust Q (i.e., every rational number turns up as 
one of the elements a, of the sequence). However, it is quite difficult (though not 
impossible) to actually try and come up with an explicit sequence ao, a), ... which 
does this; see Exercise 8.1.10. 


— Exercises — 


Exercise 8.1.1 Let X be a set. Show that X is infinite if and only if there exists a proper subset 
Y ¢ X of X which has the same cardinality as X. (This exercise requires the axiom of choice, 
Axiom 8.1) 


Exercise 8.1.2. Prove Proposition 8.1.4. (Hint: you can either use induction, or use the principle of 
infinite descent, Exercise 4.4.2, or use the least upper bound (or greatest lower bound) principle, 
Theorem 5.5.9.) Does the well-ordering principle work if we replace the natural numbers by the 
integers? What if we replace the natural numbers by the positive rationals? Explain. 


Exercise 8.1.3 Fill in the gaps marked (?) in Proposition 8.1.5. 
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Exercise 8.1.4 Prove Proposition 8.1.8. (Hint: the basic problem here is that f is not assumed to 
be one-to-one. Define A to be the set 


A:={néeN: f(m) & f(n) for all0 <m <n}; 


informally speaking, A is the set of natural numbers n for which f(n) does not appear in the 
sequence f(0), f(1),... f(m — 1). Prove that when f/f is restricted to A, it becomes a bijection 
from A to f(N). Then use Corollary 8.1.6.) 


Exercise 8.1.5 Use Proposition 8.1.8 to prove Corollary 8.1.9. 


Exercise 8.1.6 Let A be aset. Show that A is at most countable if and only if there exists an injective 
map f: A— N from A toN. 


Exercise 8.1.7 Prove Proposition 8.1.10. (Hint: by hypothesis, we have a bijection f: N— xX, 
and a bijection g: N > Y.Nowdefineh: N > X UY bysetting h(2n) := f(n) andh(2n + 1) := 
g(n) for every natural number n, and show that h(N) = X U Y. Then use Corollary 8.1.9, and show 
that X U Y cannot possibly be finite.) 


Exercise 8.1.8 Use Corollary 8.1.13 to prove Corollary 8.1.14. 


Exercise 8.1.9 Suppose that J is an at most countable set, and for each a € J, let Ag be an at most 
countable set. Show that the set ),,<; Aa is also at most countable. In particular, countable unions 
of countable sets are countable. (This exercise requires the axiom of choice, see Sect. 8.4.) 


Exercise 8.1.10 Find a bijection f : N — Q from the natural numbers to the rationals. (Warning: 
this is actually rather tricky to do explicitly; it is difficult to get f to be simultaneously injective 
and surjective.) 


8.2. Summation on Infinite Sets 


We now introduce the concept of summation on countable sets, which will be well- 
defined provided that the sum is absolutely convergent. 


Definition 8.2.1 (Series on countable sets) Let X be acountable set, and let f: X > 
R be a function. We say that the series )°.-y f(x) is absolutely convergent iff for 
some bijection g: N > X, the sum }°°°) f(g(n)) is absolutely convergent. We then 
define the sum of )°.-y f(x) by the formula 


Yd f@) = f(g). 


xeX n=0 


From Proposition 7.4.3, one can show that these definitions do not depend on the 
choice of g, and so are well-defined. 
We can now give an important theorem about double summations. 


Theorem 8.2.2 (Fubini’s theorem for infinite sums) Let f: N x N > R be a func- 
tion such that Dy aaum)eNxN f(a, m) is absolutely convergent. Then we have 
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» (Sorm) = >) fam) 
n=0 \m=0 (n,m)ENXN 
= >> f(a,m) 


(m,n)ENxN 


=) (> fiom). 


m=0 \n=0 


In other words, we can switch the order of infinite sums provided that the entire 
sum is absolutely convergent. You should go back and compare this with Example 
1.2.5. 


Proof (A sketch only; this proof is considerably more complex than the other proofs, 
and is optional reading.) The second equality follows easily from Proposition 7.4.3 
(and Proposition 3.6.4). We shall just prove the first equality, as the third is very 
similar (basically one switches the réle of n and m). 

Let us first consider the case when f (n, m) is always non-negative (we will deal 
with the general case later). Write 


L:i= So fin,m); 


(n,mj)ENXN 


our task is to show that the series pe Ose f(n, m)) converges to L. 

One can easily show that uum)ex f(,m) < L for all finite sets X CN XN. 
(Why? Use a bijection g between N x N and N, and then use the fact that 9(X) 
is finite, hence bounded.) In particular, for every n e N and M EN we have 
~ f(n,m) < L, which implies by Proposition 6.3.8 that )°°>_, f (n,m) is con- 
vergent for each m. Similarly, for any N ¢ N and M € N we have (by Corollary 


7.1.14) 
N M 
VY fam < YS fam) <b 
n=0 m=0 (n,meEX 


where X is the set {((n,m) € Nx N:n < N,m < M} whichis finite by Proposition 
3.6.14. Taking suprema of this as M — oo we have (by limit laws, and an induction 


on N) 
N ow 
~ om f(n,m) < L. 
n=0 m=0 
By Proposition 6.3.8, this implies that }°~. 5 )->-_) f(n, m) converges, and 


S\ So f(a,m) < L. 


n=0 m=0 
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To finish the proof, it will suffice to show that 


s Syeweee 


n=0 m=0 


for every ¢ > 0. (Why will this be enough? Prove by contradiction.) So, lete > 0. By 
definition of L, we can then finda finite set X C N x Nsuch that > inm)ex f(n,m) => 
L — ¢. (Why?) This set, being finite, must be contained in some set of the form 
Y := {(n,m) €NxXN:n < N;m < M}.(Why? Use induction.) Thus by Corollary 


7TAA4 
N M 
LY fam= YL fam> YO fam)>L—-e 
n=0 m=0 (n,m)eY (nsmjEeX 
and hence 
co 0 N ow N M 
YS fam) = YY fam) = OY fam) = L—e 
n=0 m=0 n=0 m=0 n=0 m=0 
as desired. 


This proves the claim when the f(n, m) are all non-negative. A similar argument 
works when the f (n,m) are all non-positive (in fact, one can simply apply the 
result just obtained to the function — f(n,m), and then use limit laws to remove 
the —. For the general case, note that any function f(n,m) can be written (why?) 
as f,(n,m) + f_(n,m), where f,(n,m) is the positive part of f(n,m) (i.e., it 
equals f (n,m) when f(n, m) is positive, and 0 otherwise), and f_ is the negative 
part of f(n,m) (Git equals f(n,m) when f(n, m) is negative, and 0 otherwise). It 
is easy to show that if yD m)eNxN f (n,m) is absolutely convergent, then so are 
menxn f+(1, m) and Yn mycenxn f- (1, m). So now one applies the results just 
obtained to f; and to f_ and adds them together using limit laws to obtain the result 
for a general f. 


There is another characterization of absolutely convergent series. 


Lemma 8.2.3 Let X be a countable set, and let f : X — R be a function. Then the 
series )).<y f (x) is absolutely convergent if and only if 


sup} > | f(a)|: AC X, Afni < 00. 


xeEA 


Proof See Exercise 8.2.1. 


Inspired by this lemma, we may now define the concept of an absolutely conver- 
gent series even when the set X could be uncountable. (We give some examples of 
uncountable sets in the next section.) 
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Definition 8.2.4 Let X be a set (which could be uncountable), and let f: X > R 
be a function. We say that the series )°.-y f(x) is absolutely convergent iff 


sup} >| f(x)|: A CX, A finite } < 00. 


xeA 


Note that we have not yet said what the series )°,-y f(x) is equal to. This shall 
be accomplished by the following lemma. 


Lemma 8.2.5 Let X be a set (which could be uncountable), and let f : X — R be 
a function such that the series )\..-x f(x) is absolutely convergent. Then the set 


{x € X : f(x) 4 0} is at most countable. (This result requires the axiom of choice, 
see Sect. 8.4.) 


Proof See Exercise 8.2.2. 


Because of this, we can define the value of }°..y f(x) for any absolutely con- 
vergent series on an uncountable set X by the formula 


Yi@= + 7h. 


xEX xEX: f(x) AO 


since we have replaced a sum on an uncountable set X by a sum on the at most 
countable set {x € X : f(x) 4 0}. (Note that if the former sum is absolutely conver- 
gent, then the latter one is also.) Note also that this definition is consistent with the 
definitions we already have for series on countable sets. 

We give some laws for absolutely convergent series on arbitrary sets. 


Proposition 8.2.6 (Absolutely convergent series laws) Let X be an arbitrary set 
(possibly uncountable), and let f: X — Rand g: X — R be functions such that 
the series )) cy f(x) and )°.-y g(x) are both absolutely convergent. 


(a) The series Dees + g(x)) is absolutely convergent, and 


YF@) + @) = 0 F@) + 5 e@). 


xEeX xeX xEeX 


(b) If cis a real number, then )~.-y cf (x) is absolutely convergent, and 


xeX 
Def) = cD) fe). 
xeX xeX 


(c) If X =X,UX» for some disjoint sets X; and X2, then ~ 
De x, J (x) are absolutely convergent, and 


YY Ff) = >) fO)+ DS FO). 


xEX UX) xeX, xEX) 


f(x) and 


xEX 
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Conversely, if h: X — R is such that }) <x, h(x) and D) cx, h(x) are abso- 
lutely convergent, then )> .< x,ux, 2 (x) is also absolutely convergent, and 


Yh) = SO he) + YS hh). 


xEX,UX2 xEX, xEX2 


(d) If Y is another set, and @:Y — X is a bijection, then Pee F(@Q)) is abso- 


lutely convergent, and 
YOO) = Yo FO). 


yeY xeX 


(This result requires the axiom of choice when X is uncountable, see Sect. 8.4.) 


Proof See Exercise 8.2.3. 


Recall in Example 7.4.4 that if a series was conditionally convergent, then its 
behavior with respect to rearrangements was bad. We now analyze this phenomenon 
further. 


Lemma 8.2.7 Let yy ay be a series of real numbers which is conditionally con- 
vergent (convergent but not absolutely convergent). Define the sets A, := {n EN: 
a, = O} and A_ := {n EN: a, < 0}, thus A, UA_ = Nand A, A_ = G. Then 
both of the series So edy a, and a a, are not absolutely convergent. 


Proof See Exercise 8.2.4. 


We are now ready to present a remarkable theorem of Georg Riemann (1826- 
1866), which asserts that a series which converges conditionally but not absolutely 
can be rearranged to converge to any value one pleases! 


Theorem 8.2.8 Let pe a, be a series which is conditionally convergent (i.e., 
convergent, but not absolutely convergent), and let L be any real number. Then there 
exists a bijection f : N > N such that \~~-_. afm) converges conditionally to L. 


Proof (Optional) We give a sketch of the proof, leaving the details to be filled in 
in Exercise 8.2.5. Let A, and A_ be the sets in Lemma 8.2.7; from that lemma 
we know that oa A, an and aan 4. 4n both fail to be absolutely convergent. In 
particular A, and A_ are infinite (why?). By Proposition 8.1.5 we can then find 
increasing bijections f,; : N—> A, and f_: N—> A_. Thus the sums )°~_, af,(m) 
and )~>>_4 ay m) both fail to be absolutely convergent (why?). The plan shall be to 
select terms from the divergent series ere af, cm) and . af _(m) ina well-chosen 
order in order to keep their difference converging toward L. 

We define the sequence ng, 11, 2, ... of natural numbers recursively as follows. 
Suppose that j is a natural number, and that n; has already been defined for alli < j 
(this is vacuously true if j = 0). We then define n; by the following rule: 
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(1) If pa eer dn, < L, then we set 

nj :=min{n € A, :n #7; foralli < j}. 
(II) If instead » pee jn, = L, then we set 

nj :=min{n ¢ A_:n An; foralli < j}. 

Note that this recursive definition is well-defined because A; and A~ are infi- 

nite, and so the sets {n € Ay :n An; foralli < j} andn; :=min{n € A_:nF 
n; for alli < j} are never empty. (Intuitively, we add a non-negative number to the 


series whenever the partial sum is too low, and add a negative number when the sum 
is too high.) One can then verify the following claims: 


The map j +> 1; is injective. (Why?) 

Case I occurs an infinite number of times, and Case II also occurs an infinite 
number of times. (Why? prove by contradiction.) 

The map j +> 7; is surjective. (Why?) 

We have lim j_.50 dn; = 0. (Why? Note from Corollary 7.2.6 that lim, dn = 0.) 
We have lim jo paeer Gn, = L. (Why?) 


The claim then follows by setting f(i) := n; for alli. 


— Exercises — 
Exercise 8.2.1 Prove Lemma 8.2.3. (Hint: you may find Exercise 3.6.3 to be useful.) 


Exercise 8.2.2. Prove Lemma 8.2.5. (Hint: first show that if M is the quantity 


M := sup SIF @)| : AC X,A finite 


xeA 


then the sets {x € X : | f(x)| > 1/n} are finite with cardinality at most Mn for every positive integer 
n. Then use Exercise 8.1.9 (which uses the axiom of choice, see Sect. 8.4).) 


Exercise 8.2.3 Prove Proposition 8.2.6. (Hint: you may of course use all the results from Chap.7 
to do this.) 


Exercise 8.2.4 Prove Lemma 8.2.7. (Hint: prove by contradiction, and use limit laws.) 
Exercise 8.2.5 Explain the gaps marked (why?) in the proof of Theorem 8.2.8. 


Exercise 8.2.6 Let )°°°9 an be a series which is conditionally convergent (i.e., convergent but 
not absolutely convergent). Show that there exists a bijection f : N — N such that °° a f(m) 
diverges to +00, or more precisely that 


N N 
mint aj) = Himsup Y> ayn) = 420. 


<i 
m=0 © m=0 


(Of course, a similar statement holds with +-oo replaced by —oo.) 
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8.3. Uncountable Sets 


We have just shown that a lot of infinite sets are countable - even such sets as 
the rationals, for which it is not obvious how to arrange as a sequence. After such 
examples, one may begin to hope that other infinite sets, such as the real numbers, 
are also countable - after all, the real numbers are nothing more than (formal) limits 
of the rationals, and we’ve already shown the rationals are countable, so it seems 
plausible that the reals are also countable. 

It was thus a great shock when Georg Cantor (1845-1918) showed in 1873 that 
certain sets—including the real numbers R are in fact uncountable—no matter how 
hard you try, you cannot arrange the real numbers R as a sequence ao, a), a2, .... (Of 
course, the real numbers R can contain many infinite sequences, e.g., the sequence 
0, 1,2,3,4,.... However, what Cantor proved is that no such sequence can ever 
exhaust the real numbers; no matter what sequence of real numbers you choose, 
there will always be some real numbers that are not covered by that sequence.) 

Recall from Remark 3.4.11 that if X is a set, then the power set of X, denoted 
2* :={A: AC X}, is the set of all subsets of X. Thus for instance 2!!:2} = 
{H, {1}, {2}, {1, 2}}. The reason for the notation 2* is given in Exercise 8.3.1. 


Theorem 8.3.1 (Cantor’s theorem) Let X be an arbitrary set (finite or infinite). Then 
the sets X and 2* cannot have equal cardinality. 


Proof Suppose for sake of contradiction that the sets X and 2* had equal cardinality. 
Then there exists a bijection f: X — 2* between X and the power set of X. Now 
consider the set 

A:={xeX:x € f(x)}. 


Note that this set is well-defined since f(x) is an element of 2* and is hence a 
subset of X. Clearly A is a subset of X, hence is an element of 2*. Since f is a 
bijection, there must therefore exist x € X such that f(x) = A. There are now two 
cases, depending on whether x € A or x ¢ A. If x € A, then by definition of A we 
have x ¢ f(x), hence x ¢ A, a contradiction. But if x ¢ A, then x ¢ f(x), hence 
by definition of A we have x € A, a contradiction. Thus in either case we have a 
contradiction. 


Remark 8.3.2. The reader should compare the proof of Cantor’s theorem with the 
statement of Russell’s paradox (Sect. 3.2). The point is that a bijection between X 
and 2% would come dangerously close to the concept of a set X “containing itself”. 


Corollary 8.3.3 2% is uncountable. 


Proof By Theorem 8.3.1, 2N cannot have equal cardinality with N, hence is either 
uncountable or finite. However, 2% contains as a subset the set of singletons {{n} : 
n € N}, which is clearly bijective to N and hence countably infinite. Thus 2“ cannot 
be finite (by Proposition 3.6.14) and is hence uncountable. 


Cantor’s theorem has the following important (and unintuitive) consequence. 
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Corollary 8.3.4 R is uncountable. 


Proof Let us define the map f: 2X — R by the formula 


f(A) := >. 10-". 


neA 


Observe that since )**° , 10~” is an absolutely convergent series (by Lemma 7.3.3), 
the series }°,., 10~” is also absolutely convergent (by Proposition 8.2.6c). Thus 
the map f is well-defined. We now claim that f is injective. Suppose for sake of 
contradiction that there were two distinct sets A, B € 2N such that F(A) = f(B). 
Since A # B, the set (A\B) U(B\A) is a non-empty subset of N. By the well- 
ordering principle (Proposition 8.1.4), we can then define the minimum of this set, 
say No := min(A\B) U (B\A). Thus no either lies in A\B or B\A. By symmetry 
we may assume it lies in A\B. Then no € A, no ¢ B, and for all n < no we either 
haven € A, Born ¢ A, B. Thus 


0= f(A)— Ff) 

=) > 10"-)°10" 
neA neB 

= ( yY> 1o"+10"+4+ >> 1) 

n<no:ne€A n>no:ne€A 
~ ( y> 1o"+ YO ) 
n<no:nEB n>no:ne€B 
= 107" ae » 107" — ye 107” 
n>ng:ineA n>no:neB 


IV 


10-" +0— 5° 10" 


n>no 


IV 


107"° — 1 gon 
9 
> 0, 


a contradiction, where we have used the geometric series lemma (Lemma 7.3.3) to 


sum 
—n __ = (no+l+m) __ no—1 = m __ 1 —no 
) 10° = ) 10°“ =10°" dX 10°" = go" : 


n>no m=0 
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Thus f is injective, which means that f(2N) has the same cardinality as 2N and is 
thus uncountable. Since f (2) is a subset of R, this forces R to be uncountable also 
(otherwise this would contradict Corollary 8.1.7), and we are done. 


Remark 8.3.5 We will give another proof of this result using measure theory in 
Exercise 7.2.6 of Analysis II. 


Remark 8.3.6 Corollary 8.3.4 shows that the reals have strictly larger cardinality 
than the natural numbers (in the sense of Exercise 3.6.7). One could ask whether 
there exist any sets which have strictly larger cardinality than the natural numbers, 
but strictly smaller cardinality than the reals. The Continuum Hypothesis asserts that 
no such sets exist. Interestingly, it was shown in separate works of Kurt Gédel (1906- 
1978) and Paul Cohen (1934—2007) that this hypothesis is independent of the other 
axioms of set theory; it can neither be proved nor disproved in that set of axioms 
(unless those axioms are inconsistent, which is highly unlikely). 


— Exercises — 


Exercise 8.3.1 Let X be a finite set of cardinality n. Show that 2* is a finite set of cardinality 2”. 
(Hint: use induction on n.) 


Exercise 8.3.2 Let A, B, C be sets such that A C B C C, and suppose that there is a injection 
f:C— A. Define the sets Do, Dj, D2, ... recursively by setting Do := B\A, and then D,+1 := 
f (Dn) for all natural numbers n. Prove that the sets Do, D1, .. . are all disjoint from each other (i.e., 
D, Dm = % whenever n 4 m). Also show that if g: A — B is the function defined by setting 
g(x) i= f(x) when x € Ur, Dy, and g(x) := x when x ¢ ee Dy, then g does indeed map 
A to B and is a bijection between the two. In particular, A and B have the same cardinality. 


Exercise 8.3.3 Recall from Exercise 3.6.7 that a set A is said to have lesser or equal cardinality 
than a set B iff there is an injective map f: A — B from A to B. Using Exercise 8.3.2, show that 
if A, B are sets such that A has lesser or equal cardinality to B and B has lesser or equal cardinality 
to A, then A and B have equal cardinality. (This is known as the Schréder—Bernstein theorem, after 
Ernst Schréder (1841-1902) and Felix Bernstein (1878—1956).) 


Exercise 8.3.4 Let us say that a set A has strictly lesser cardinality than a set B if A has lesser 
than or equal cardinality to B (in the sense of Exercise 3.6.7) but A does not have equal cardinality 
to B. Show that for any set X, that X has strictly lesser cardinality than 2*. Also, show that if A 
has strictly lesser cardinality than B, and B has strictly lesser cardinality than C, then A has strictly 
lesser cardinality than C. 


Exercise 8.3.5 Show that no power set (i.e., a set of the form 2* for some set X) can be countably 
infinite. 


8.4 The Axiom of Choice 


We now discuss the final axiom of the standard Zermelo—Fraenkel—Choice system 
of set theory, namely the axiom of choice. We have delayed introducing this axiom 
for a while now, to demonstrate that a large portion of the foundations of analysis 
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can be constructed without appealing to this axiom. However, in many further devel- 
opments of the theory, it is very convenient (and in some cases even essential) to 
employ this powerful axiom. On the other hand, the axiom of choice can lead to 
a number of unintuitive consequences (for instance the Banach—Tarski paradox, a 
simplified version of which we will encounter in Sect.7.3) and can lead to proofs 
that are philosophically somewhat unsatisfying. Nevertheless, the axiom is almost 
universally accepted by mathematicians. One reason for this confidence is a theorem 
due to the great logician Kurt Gédel, who showed that a result proven using the axiom 
of choice will never contradict a result proven without the axiom of choice (unless all 
the other axioms of set theory are themselves inconsistent, which is highly unlikely). 
More precisely, Gddel demonstrated that the axiom of choice is undecidable; it can 
neither be proved nor disproved from the other axioms of set theory, so long as those 
axioms are themselves consistent. (From a set of inconsistent axioms one can prove 
that every statement is both true and false.) In practice, this means that any “real-life” 
application of analysis (more precisely, any application involving only “decidable” 
questions) which can be rigorously supported using the axiom of choice, can also be 
rigorously supported without the axiom of choice, though in many cases it would take 
a much more complicated and lengthier argument to do so if one were not allowed to 
use the axiom of choice. Thus one can view the axiom of choice as a convenient and 
safe labor-saving device in analysis. In other disciplines of mathematics, notably in 
set theory in which many of the questions are not decidable, the issue of whether to 
accept the axiom of choice is more open to debate and involves some philosophical 
concerns as well as mathematical and logical ones. However, we will not discuss 
these issues in this text. 

We begin by generalizing the notion of finite Cartesian products from Definition 
3.5.6 to infinite Cartesian products. 


Definition 8.4.1 (Infinite Cartesian products) Let I be a set (possibly infinite), and 
for each aw € I let X, be a set. We then define the Cartesian product [[,-; Xq to be 
the set 


ael 


] [Xe = 4 Gadecr € (LL) Xp)! : Xe € Xq for alla eT}, 
ael Bel 


where we recall (from Axiom 3.11) that (LU, el X,)! is the set of all functions (Xy)yer 
which assign an element xy € User Xx, toeacha € J. Thus [lee X, is a subset of 
that set of functions, consisting instead of those functions (%q)vez which assign an 
element x, € Xq toeacha € I. 


Example 8.4.2. For any sets I and X, we have [],-; X = X’ (why?). If J is a set 
of the form J := {i € N: 1 <i <n}, then ee Xq is essentially the same set as 
the set [[,<;—, X; defined in Definition 3.5.6, in the sense that there is a canonical 
bijection between the two sets (why?). 


Recall from Lemma 3.5.11 that if X;,..., X, were any finite collection of non- 
empty sets, then the finite Cartesian product [[,<;<, X; was also non-empty. The 
axiom of choice asserts that this statement is also true for infinite Cartesian products: 
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Axiom 8.1 (Choice) Let J be a set, and for each a € J, let X, be a non-empty set. 
Then Tee , Xq is also non-empty. In other words, there exists a function (X%q)er 
which assigns to eacha € J anelement x, € Xq. 


Remark 8.4.3 The intuition behind this axiom is that given a (possibly infinite) 
collection of non-empty sets X,, one should be able to choose a single element x, 
from each one, and then form the possibly infinite tuple (x.)ve,; from all the choices 
one has made. On the one hand, this is a very intuitively appealing axiom; in some 
sense one is just applying Lemma 3.1.5 over and over again. On the other hand, the 
fact that one is making an infinite number of arbitrary choices, with no explicit rule 
as to how to make these choices, is a little disconcerting. Indeed, there are many 
theorems proven using the axiom of choice which assert the abstract existence of 
some object x with certain properties, without saying at all what that object is, or 
how to construct it. Thus the axiom of choice can lead to proofs which are non- 
constructive—demonstrating existence of an object without actually constructing 
the object explicitly. This problem is not unique to the axiom of choice—it already 
appears for instance in Lemma 3.1.5—but the objects shown to exist using the axiom 
of choice tend to be rather extreme in their level of non-constructiveness. However, as 
long as one is aware of the distinction between a non-constructive existence statement, 
and a constructive existence statement (with the latter being preferable, but not strictly 
necessary in many cases), there is no difficulty here, except perhaps on a philosophical 
level. 


Remark 8.4.4 There are many equivalent formulations of the axiom of choice; we 
give some of these in the exercises below. 


In analysis one often does not need the full power of the axiom of choice. Instead, 
one often only needs the axiom of countable choice, which is the same as the axiom 
of choice but with the index set J restricted to be at most countable. We give a typical 
example of this below. 


Lemma 8.4.5 Let E be a non-empty subset of the real line with sup(E) < 0¢ (i.e., 
E is bounded from above). Then there exists a sequence (a,)°°_, whose elements ay 
all lie in E, such that limy+o0 A, = sup(E). 


Proof For each positive natural number n, let X,, denote the set 
X, i= {x € E: sup(E) — 1/n < x < sup(£)}. 


Since sup(£) is the least upper bound for E, then sup(Z) — 1/n cannot be an upper 
bound for EF, and so X,, is non-empty for each n. Using the axiom of choice (or 
the axiom of countable choice), we can then find a sequence (a,)°, such that a, € 
X, for all n > 1. In particular a, € E for all n, and sup(E) — 1/n < a, < sup(E) 
for all n. But then we have lim,_,.. d, = sup(E) by the squeeze test (Corollary 
6.4.14). 
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Remark 8.4.6 In many special cases, one can obtain the conclusion of this lemma 
without using the axiom of choice. For instance, if E is a closed set (Definition 
1.2.12), then one can define a, without choice by the formula a, := inf(X,,); the 
extra hypothesis that EF is closed will ensure that a, lies in E. 


Another formulation of the axiom of choice is as follows. 


Proposition 8.4.7 Let X and Y be sets, and let P(x, y) be a property pertaining to 
an object x € X and an object y € Y such that for every x € X there is at least one 
y € Y such that P(x, y) is true. Then there exists a function f : X — Y such that 
P(x, f(x)) is true for all x € X. 


Proof See Exercise 8.4.1. 


— Exercises — 


Exercise 8.4.1 Show that the axiom of choice implies Proposition 8.4.7. (Hint: consider the sets 
Y, = {y € Y: P(x, y) is true} for each x € X.) Conversely, show that if Proposition 8.4.7 is true, 
then the axiom of choice is also true. 


Exercise 8.4.2. Let I be a set, and for each a € J let Xq be a non-empty set. Suppose that all the 
sets Xq are disjoint from each other, i.e., Xg 1 Xg = W for all distinct a, B € J. Using the axiom of 
choice, show that there exists a set Y such that #(Y N Xq) = 1 for alla € I (i.e., Y intersects each 
Xq in exactly one element). Conversely, show that if the above statement was true for an arbitrary 
choice of sets J and non-empty disjoint sets Xq, then the axiom of choice is true. (Hint: the problem 
is that in Axiom 8.1 the sets Xq are not assumed to be disjoint. But this can be fixed by the trick by 
looking at the sets {a} x Xq = {(a@, x) : x € Xq} instead.) 


Exercise 8.4.3 Let A and B be sets such that there exists a surjection g: B — A. Using the axiom 
of choice, show that there then exists an injection f: A + B with go f: A — A the identity map; 
in particular, A has lesser or equal cardinality to B in the sense of Exercise 3.6.7. (Hint: consider 
the inverse images g! ({a}) for each a € A.) Compare this with Exercise 3.6.8. Conversely, show 
that if the above statement is true for arbitrary sets A, B and surjections g: B — A, then the axiom 
of choice is true. (Hint: use Exercise 8.4.2.) 


8.5 Ordered Sets 


The axiom of choice is intimately connected to the theory of ordered sets. There are 
actually many types of ordered sets; we will concern ourselves with three such types, 
the partially ordered sets, the totally ordered sets, and the well-ordered sets. 


Definition 8.5.1 (Partially ordered sets) A partially ordered set (or poset) is aset X, 
together! with a relation <x on X (thus for any two objects x, y € X, the statement 
x <x y is either a true statement or a false statement). Furthermore, this relation is 
assumed to obey the following three properties: 


' Strictly speaking, a partially ordered set is not a set X, but rather a pair (X, <x). But in many 
cases the ordering <y will be clear from context, and so we shall refer to X itself as the partially 
ordered set even though this is technically incorrect. 
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e (Reflexivity) For any x € X, we have x <x x. 
e (Antisymmetry) If x, y € X are such that x <y y and y <y x, thenx = y. 
e (Transitivity) If x, y,z © X are such that x <y y and y < y z, then x <x z. 


We refer to <x as the ordering relation. In most situations it is understood what the 
set X is from context, and in those cases we shall simply write < instead of <x. We 
write x <x y (orx < y for short) ifx <y yandx #y. 


Examples 8.5.2. The natural numbers N together with the usual less-than-or-equal-to 
relation < (as defined in Definition 2.2.11) forms a partially ordered set, by Propo- 
sition 2.2.12. Similar arguments (using the appropriate definitions and propositions) 
show that the integers Z, the rationals Q, the reals R, and the extended reals R* are 
also partially ordered sets. Meanwhile, if X is any collection of sets, and one uses the 
relation of is-a-subset-of C (as defined in Definition 3.1.14) for the ordering relation 
<x, then X is also partially ordered (Proposition 3.1.17). Note that it is certainly 
possible to give these sets a different partial ordering than the standard one; see for 
instance Exercise 8.5.3. 


Definition 8.5.3 (Totally ordered set) Let X be a partially ordered set with some 
order relation <y. A subset Y of X is said to be fotally ordered if, given any two 
y, y’ € Y, we either have y <x y’ or y’ <x y (or both). If X itself is totally ordered, 
we say that X is a totally ordered set (or chain) with order relation <y. 


Examples 8.5.4 The natural numbers N, the integers Z, the rationals Q, reals R, and 
the extended reals R*, all with the usual ordering relation <, are totally ordered (by 
Proposition 2.2.13, Lemma 4.1.11, Proposition 4.2.9, Proposition 5.4.7, and Propo- 
sition 6.2.5, respectively). Also, any subset of a totally ordered set is again totally 
ordered (why?). On the other hand, a collection of sets with the C relation is usually 
not totally ordered. For instance, if X is the set {{1, 2}, {2}, {2, 3}, {2, 3, 4}, {S}}, 
ordered by the set inclusion relation C, then the elements {1, 2} and {2, 3} of X are 
not comparable to each other (ie., {1, 2} Z {2, 3} and {2, 3} Z {1, 2}). 


Definition 8.5.5 (Maximal and minimal elements) Let X be a partially ordered set, 
and let Y be a subset of X. We say that y is a minimal element of Y if y € Y and 
there is no element y’ € Y such that y’ < y. We say that y is a maximal element of 
Y if y € Y and there is no element y’ € Y such that y < y’. 


Example 8.5.6 Using the set X from the previous example, {2} is a minimal element, 
{1,2} and {2, 3,4} are maximal elements, {5} is both a minimal and a maximal 
element, and {2, 3} is neither a minimal nor a maximal element. This example shows 
that a partially ordered set can have multiple maxima and minima; however, a totally 
ordered set cannot (Exercise 8.5.7). 


Example 8.5.7 The natural numbers N (ordered by <) have a minimal element, 
namely 0, but no maximal element. The set of integers Z has no maximal and no 
minimal element. 
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Definition 8.5.8 (Well-ordered sets) Let X be a partially ordered set, and let Y be a 
totally ordered subset of X. We say that Y is well-ordered if every non-empty subset 
Z of Y has a minimal element min(Z). 


Examples 8.5.9 The natural numbers N are well-ordered by Proposition 8.1.4. 
However, the integers Z, the rationals Q, and the real numbers R are not (see Exercise 
8.1.2). Every finite totally ordered set is well-ordered (Exercise 8.5.8). Every subset 
of a well-ordered set is again well-ordered (why?). 


One advantage of well-ordered sets is that they automatically obey a principle of 
strong induction (cf. Proposition 2.2.14): 


Proposition 8.5.10 (Principle of strong induction) Let X be a well-ordered set with 
an ordering relation <, and let P(n) be a property pertaining to an element n € X 
(i.e., for eachn € X, P(n) is either a true statement or a false statement). Suppose 
that for everyn € X, we have the following implication: if P(m) is true forallm € X 
with m <x n, then P(n) is also true. Then P(n) is true for alln € X. 


Remark 8.5.11 It may seem strange that there is no “base” case in strong induction, 
corresponding to the hypothesis P(0) in Axiom 2.5. However, such a base case is 
automatically included in the strong induction hypothesis. Indeed, if 0 is the minimal 
element of X, then by specializing the hypothesis “if P (m) is true for all m € X with 
m <x n, then P(n) is also true” to then = O case, we automatically obtain that P (0) 
is true. (Why?) 


Proof See Exercise 8.5.10. 


So far we have not seen the axiom of choice play any réle. This will come in once 
we introduce the notion of an upper bound and a strict upper bound. 


Definition 8.5.12 (Upper bounds and strict upper bounds) Let X be a partially 
ordered set with ordering relation <, and let Y be a subset of X. If x € X, we say 
that x is an upper bound for Y iff y < x for all y € Y. If in addition x ¢ Y, we say 
that x is a strict upper bound for Y. Equivalently, x is a strict upper bound for Y iff 
y <x forall y € Y. (Why is this equivalent?) 


Example 8.5.13 Let us work in the real number system R with the usual ordering 
<. Then 2 is an upper bound for the set {x € R: 1 < x < 2} but is not a strict upper 
bound. The number 3, on the other hand, is a strict upper bound for this set. 


Lemma 8.5.14 Let X be a partially ordered set with ordering relation <, and let xo 
be an element of X. Then there is a well-ordered subset Y of X which has xo as its 
minimal element and which has no strict upper bound. 


Proof The intuition behind this lemma is that one is trying to perform the following 
algorithm: we initalize Y := {xo}. If Y has no strict upper bound, then we are done; 
otherwise, we choose a strict upper bound and add it to Y. Then we look again to see 
if Y has a strict upper bound or not. If not, we are done; otherwise we choose another 
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strict upper bound and add it to Y. We continue this algorithm “infinitely often” 
until we exhaust all the strict upper bounds; the axiom of choice comes in because 
infinitely many choices are involved. This is however not a rigorous proof because 
it is quite difficult to precisely pin down what it means to perform an algorithm 
“infinitely often”. Instead, what we will do is that we will isolate a collection of 
“partially completed” sets Y, which we shall call good sets, and then take the union 
of all these good sets to obtain a “completed” object Y,. which will indeed have no 
strict upper bound. 

We now begin the rigorous proof. Suppose for sake of contradiction that every 
well-ordered subset Y of X which has xo as its minimal element has at least one strict 
upper bound. Using the axiom of choice (in the form of Proposition 8.4.7), we can 
thus assign a strict upper bound s(Y) € X to each well-ordered subset Y of X which 
has xo as its minimal element. 

Henceforth we fix a single such strict upper bound function s. Let us define a 
special class of subsets Y of X. We say that a subset Y of X is good iff it is well- 
ordered, contains x9 as its minimal element, and obeys the property that 


x=s({yeY:y <x}) forall x € Y\{xo}. 


Note that if x € Y\{xo} then the set {y € Y : y < x} is a subset of X which is well- 
ordered and contains xo as its minimal element. Let Q := {Y C X : Y is good} be 
the collection of all good subsets of X. This collection is not empty, since the subset 
{xo} of X is clearly good (why?). 

We make the following important observation: if Y and Y’ are two good subsets 
of X, then every element of Y’\Y is a strict upper bound for Y, and every element 
of Y\Y’ is a strict upper bound for Y’ (Exercise 8.5.13). In particular, given any two 
good sets Y and Y’, at least one of Y’\Y and Y\Y’ must be empty (since they are 
both strict upper bounds of each other). In other words, Q is totally ordered by set 
inclusion: given any two good sets Y and Y’, either Y C Y’ or Y’ CY. 

Let Yoo := UQ, i.e., Yoo is the set of all elements of X which belong to at least 
one good subset of X. Clearly x9 € Yoo. Also, since each good subset of X has xo as 
its minimal element, the set Y,, also has xo as its minimal element (why?). 

Next, we show that Y, is totally ordered. Let x, x’ be two elements of Y.. By 
definition of Y., we know that x lies in some good set Y and x’ lies in some good 
set Y’. But since Q is totally ordered, one of these good sets contains the other. Thus 
x, x’ are contained in a single good set (either Y or Y’); since good sets are totally 
ordered, we thus see that either x < x’ or x’ < x as desired. 

Next, we show that Y.. is well-ordered. Let A be any non-empty subset of Yo. 
Then we can pick an element a € A, which then lies in Y,.. Therefore there is a 
good set Y such that a e Y. Then AMY is a non-empty subset of Y; since Y is 
well-ordered, the set A M Y thus has a minimal element, call it b. Now recall that for 
any other good set Y’, every element of Y’\Y is a strict upper bound for Y, and in 
particular is larger than b. Since b is a minimal element of AM Y, this implies that 
b is also a minimal element of A Y’ for any good set Y’ with AN Y’ 4 % (why?). 
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Since every element of A belongs to Y,. and hence belongs to at least one good set 
Y’, we thus see that b is a minimal element of A. Thus Y,, is well-ordered as claimed. 

Since Y,, is well-ordered with xg as its minimal element, it has a strict upper 
bound s(Y.). But then Y.. U {s(Yoo)} is well-ordered (why? see Exercise 8.5.11) 
and has xo as its minimal element (why?). 

We now claim that Yoo U {s(Yoo)} is good. By the preceding discussion, it suffices 
to show that x = s({y € Yoo U {5(Yoo)} : y < x} when x € (Yoo U {5 (Yon) })\ {xo}. If 
x =5(Y), this is clear since {y € Y, U {s(Y~)} : y < x} = Yq in this case. If 
instead x € Y, then x € Y for some good Y. Then the set {y € Yoo U{5(Yoo)} : 
y < x} is equal to {y € Y : y < x} (why? Use the previous observation that every 
element of Y’\Y is an upper bound for x for every good Y’), and the claim then 
follows since Y is good. 

By definition of Y,,, we conclude that the good set Y,, U {5(Y..)} is contained in 
Yoo. But this is a contradiction since s(Y,.) is a strict upper bound for Y,.. Thus we 
have constructed a set with no strict upper bound, as desired. 


The above lemma has the following important consequence: 


Lemma 8.5.15 (Zorn’s lemma) Let X be a non-empty partially ordered set, with 
the property that every non-empty totally ordered subset Y of X has an upper bound. 
Then X contains at least one maximal element. 


Proof See Exercise 8.5.14. 
We give some applications of Zorn’s lemma in the exercises below. 


— Exercises — 


Exercise 8.5.1 Consider the empty set 4 with the empty order relation <g (this relation is vacuous 
because the empty set has no elements). Is this set partially ordered? totally ordered? well-ordered? 
Explain. 


Exercise 8.5.2. Give examples of a set X and a relation < such that 


(a) The relation < is reflexive and antisymmetric, but not transitive; 
(b) The relation < is reflexive and transitive, but not antisymmetric; 
(c) The relation < is antisymmetric and transitive, but not reflexive. 


Exercise 8.5.3 Given two positive integers n,m € N\{0}, we say that n divides m, and write n|m, 
if there exists a positive integer a such that m = na. Show that the set N\{0} with the ordering 
relation | is a partially ordered set but not a totally ordered one. Note that this is a different ordering 
relation from the usual < ordering of N\{O}. 


Exercise 8.5.4 Show that the set of positive reals Rt := {x € R: x > 0} have no minimal element. 


Exercise 8.5.5 Let f: X — Y bea function from one set X to another set Y. Suppose that Y is 
partially ordered with some ordering relation <y. Define a relation <x on X by defining x <y x’ 
if and only if f(x) <y f(x’) or x = x’. Show that this relation <x turns X into a partially ordered 
set. If we know in addition that the relation <y makes Y totally ordered, does this mean that the 
relation <y makes X totally ordered also? If not, what additional assumption needs to be made on 
f in order to ensure that <y makes X totally ordered? 
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Exercise 8.5.6 Let X be a partially ordered set. For any x in X, define the order ideal (x) C X to 
be the set (x) := {y € X : y < x}. Let (X) := {(x) : x € X} be the set of all order ideals, and let 
f: X — (X) be the map f(x) := (x) that sends every element of X to its order ideal. Show that f 
is a bijection, and that given any x, y € X, that x <y y if and only if f(x) C f(y). This exercise 
shows that any partially ordered set can be represented by a collection of sets whose ordering 
relation is given by set inclusion. 


Exercise 8.5.7 Let X be a partially ordered set, and let Y be a totally ordered subset of X. Show 
that Y can have at most one maximum and at most one minimum. 


Exercise 8.5.8 Show that every finite non-empty subset of a totally ordered set has a minimum 
and a maximum. (Hint: use induction.) Conclude in particular that every finite totally ordered set 
is well-ordered. 


Exercise 8.5.9 Let X be a totally ordered set such that every non-empty subset of X has both 
a minimum and a maximum. Show that X is finite. (Hint: assume for sake of contradiction that 
X is infinite. Start with the minimal element x9 of X and then construct an increasing sequence 
xo <x, <...in X.) 


Exercise 8.5.10 Prove Proposition 8.5.10, without using the axiom of choice. (Hint: consider the 
set 
Y := {n € X : P(m) is false for some m € X withm <x n}, 


and show that Y being non-empty would lead to a contradiction.) 


Exercise 8.5.11 Let X be a partially ordered set, and let Y and Y’ be well-ordered subsets of X. 
Show that Y U Y’ is well-ordered if and only if it is totally ordered. 


Exercise 8.5.12 Let X and Y be partially ordered sets with ordering relations <y and <y, respec- 
tively. Define a relation <yyy on the Cartesian product X x Y by defining (x, y) <xxy (’, y’) if 
x <x x’, orifx =x’ and y <y y’. (This is called the lexicographical ordering on X x Y, and is 
similar to the alphabetical ordering of words; a word w appears earlier in a dictionary than another 
word w/’ if the first letter of w is earlier in the alphabet than the first letter of w’, or if the first letters 
match and the second letter of w is earlier than the second letter of w’, and so forth.) Show that 
<xxy defines a partial ordering on X x Y. Furthermore, show that if X and Y are totally ordered, 
then so is X x Y, and if X and Y are well-ordered, then so is X x Y. 


Exercise 8.5.13 Prove the claim in the proof of Lemma 8.5.14, namely that every element of Y’\Y 
is an upper bound for Y and vice versa. (Hint: Show using Proposition 8.5.10 that 


{fyeY:iy<a}={yeY':y<as={yeYNY':y<a} 


foralla € YN Y’.Conclude that Y N Y’ is good, and hence s(Y N Y’) exists. Show that s(Y NY’) = 
min(Y’\Y) if Y’\Y is non-empty, and similarly with Y and Y’ interchanged. Since Y’\Y and Y\Y' 
are disjoint, one can then conclude that one of these sets is empty, at which point the claim becomes 
easy to establish.) 


Exercise 8.5.14. Use Lemma 8.5.14 to prove Lemma 8.5.15. (Hint: first show that if X had no 
maximal elements, then any subset of X which has an upper bound, also has a strict upper bound.) 


Exercise 8.5.15 Let A and B be two non-empty sets such that A does not have lesser or equal 
cardinality to B. Using Zorn’s lemma, prove that B has lesser or equal cardinality to A. (Hint: 
for every subset X C B, let P(X) denote the property that there exists an injective map from X 
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to A.) This exercise (combined with Exercise 8.3.3) shows that the cardinality of any two sets is 
comparable, as long as one assumes the axiom of choice. 


Exercise 8.5.16 Let X be a set, and let P be the set of all partial orderings of X. (For instance, 
if X := N\{O}, then both the usual partial ordering <, and the partial ordering in Exercise 8.5.3, 
are elements of P.) We say that one partial ordering <€ P is coarser than another partial ordering 
<'e P if for any x, y € X, we have the implication (x < y) = (x <’ y). Thus for instance the 
partial ordering in Exercise 8.5.3 is coarser than the usual ordering <. Let us write <<<’ if < is 
coarser than <’. Show that ~< turns P into a partially ordered set; thus the set of partial orderings 
on X is itself partially ordered. There is exactly one minimal element of P; what is it? Show that 
the maximal elements of P are precisely the total orderings of X. Using Zorn’s lemma, show that 
given any partial ordering < of X there exists a total ordering <’ such that < is coarser than <’. 


Exercise 8.5.17 Use Zorn’s lemma to give another proof of the claim in Exercise 8.4.2. (Hint: let 
Q be the set of all Y C User Xq such that #(Y M Xq) < 1 foralla € IJ, i.e., all sets which intersect 
each Xq in at most one element. Use Zorn’s lemma to locate a maximal element of Q2.) Deduce that 
Zorn’s lemma and the axiom of choice are in fact logically equivalent (i.e., they can be deduced 
from each other). 


Exercise 8.5.18 Using Zorn’s lemma, prove Hausdorff’s maximality principle: if X is a partially 
ordered set, then there exists a totally ordered subset Y of X which is maximal with respect to set 
inclusion (i.e., there is no other totally ordered subset Y’ of X which contains Y). Conversely, show 
that if Hausdorff’s maximality principle is true, then Zorn’s lemma is true. Thus by Exercise 8.5.17, 
these two statements are logically equivalent to the axiom of choice. 


Exercise 8.5.19 Let X be a set, and let Q be the space of all pairs (Y, <), where Y is a subset 
of X and < is a well-ordering of Y. If (Y, <) and (Y’, <’) are elements of Q, we say that (Y, <) 
is an initial segment of (Y’, <’) if there exists an x € Y’ such that Y := {y € Y’: y <’ x} (so in 
particular Y € Y’), and for any y, y’ € Y, y < y’ ifandonly if y <’ y’. Define a relation < on Q by 
defining (Y, <) < (Y’, <’) if either (Y, <) = (Y’, <’), orif (Y, <) is an initial segment of (Y’, <’). 
Show that < is a partial ordering of Q. There is exactly one minimal element of Q; what is it? Show 
that the maximal elements of Q are precisely the well-orderings (X, <) of X. Using Zorn’s lemma, 
conclude the well-ordering principle: every set X has at least one well-ordering. Conversely, use 
the well-ordering principle to prove the axiom of choice, Axiom 8.1. (Hint: place a well-ordering 
<onl, ey Xa, and then consider the minimal elements of each X,.) We thus see that the axiom 
of choice, Zorn’s lemma, and the well-ordering principle are all logically equivalent to each other. 


Exercise 8.5.20 Let X be aset, and let Q € 2* be acollection of subsets of X. Assume that 2 does 
not contain the empty set J. Using Zorn’s lemma, show that there is a subcollection Q’ C Q such 
that all the elements of 2’ are disjoint from each other (i.e., AM B = @ whenever A, B are distinct 
elements of Q’), but that all the elements of Q intersect at least one element of 92’ (i.e., for all C € Q 
there exists A € Q’ such that CM A # ®). (Hint: consider all the subsets of 2 whose elements are 
all disjoint from each other, and locate a maximal element of this collection.) Conversely, if the 
above claim is true, show that it implies the claim in Exercise 8.4.2, and thus this is yet another 
claim which is logically equivalent to the axiom of choice. (Hint: let Q be the set of all pair sets of 
the form {(0, a), (1, x.)}, where w € J and xy € Xq.) 


Chapter 9 M®) 
Continuous Functions on R Cheak for 


In previous chapters we have been focusing primarily on sequences. A sequence 
(dn )°°.9 can be viewed as a function from N to R, i.e., an object which assigns a real 
number a,, to each natural number n. We then did various things with these functions 
from N to R, such as take their limit at infinity (if the function was convergent), or 
form suprema, infima, etc., or computed the sum of all the elements in the sequence 
(again, assuming the series was convergent). 

Now we will look at functions not on the natural numbers N, which are “discrete’’, 
but instead look at functions on a continuum! such as the real line R, or perhaps on 
an interval such as {x € R: a < x < b}. Eventually we will perform a number of 
operations on these functions, including taking limits, computing derivatives, and 
evaluating integrals. In this chapter we will focus primarily on limits of functions, 
and on the closely related concept of a continuous function. 

Before we discuss functions, though, we must first set out some notation for 
subsets of the real line. 


9.1 Subsets of the Real Line 


Very often in analysis we do not work on the whole real line R, but on certain subsets 
of the real line, such as the positive real axis {x € R: x > 0}. Also, we occasionally 
work with the extended real line R* defined in Sect. 6.2, or in subsets of that extended 
real line. 


' We will not rigorously define the notion of a discrete set or a continuum in this text, but roughly 
speaking a set is discrete if each element is separated from the rest of the set by some non-zero 
distance, whereas a set is a continuum if it is connected and contains no “holes”. 
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There are of course infinitely many subsets of the real line; indeed, Cantor’s 
theorem (Theorem 8.3.1; see also Exercise 8.3.4) shows that there are even more 
such sets than there are real numbers. However, there are certain special subsets of 
the real line (and the extended real line) which arise quite often. One such family of 
sets are the intervals. 


Definition 9.1.1 (Intervals) Let a, b € R* be extended real numbers. We define the 
closed interval [a, b] by 


[a,b] := {x €R*:a<x <D}, 
the half-open intervals [a, b) and (a, b] by 
[a,b):={x €R*:a<x <b}; (a,b]:={x € R*:a <x < Dd}, 
and the open interval (a, b) by 
(a,b) := {x € R* :a <x < Dd}. 


We call a the left endpoint of these intervals, and b the right endpoint. 


Remark 9.1.2. Once again, we are overloading the parenthesis notation; for instance, 
we are now using (2, 3) to denote both an open interval from 2 to 3 and an ordered 
pair in the Cartesian plane R* := R x R. This can cause some genuine ambiguity, 
but the reader should still be able to resolve which meaning of the parentheses is 
intended from context. In some texts, this issue is resolved by using reversed brackets 
instead of parenthesis, and thus for instance [a, b) would now be [a, b[, (a, b] would 
be Ja, b], and (a, b) would be Ja, b[. 


Examples 9.1.3 If a and b are real numbers (i.e., not equal to +00 or —oo), then 
all of the above intervals are subsets of the real line, for instance [2,3) = {x ER: 
2 <x < 3}. The positive real axis {x € R: x > 0} is the open interval (0, +00), 
while the non-negative real axis {x € R : x > 0} is the half-open interval [0, +-00). 
Similarly, the negative real axis {x € R: x < 0} is (—oo, 0), and the non-positive 
real axis {x € R: x < O} is (—on, 0]. Finally, the real line R itself is the open interval 
(—oo, +00), while the extended real line R* is the closed interval [—oo, -++oo]. We 
sometimes refer to an interval in which one endpoint is infinite (either --oo or —0o) 
as half-infinite intervals, and intervals in which both endpoints are infinite as doubly 
infinite intervals; all other intervals are bounded intervals. Thus [2, 3) is a bounded 
interval, the positive and negative real axes are half-infinite intervals, and R and R* 
are infinite intervals. 


Example 9.1.4 Ifa > b, then all four of the intervals [a, b], [a, b), (a, b], and (a, b) 
are the empty set (why?). If a = b, then the three intervals [a, b), (a, b], and (a, b) 
are the empty set, while [a, b] is just the singleton set {a} (why?). Because of this, 
we call these intervals degenerate; most (but not all) of our analysis will be restricted 
to non-degenerate intervals. 
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Of course intervals are not the only interesting subsets of the real line. Other 
important examples include the natural numbers N, the integers Z, and the rationals 
Q. One can form additional sets using such operations as union and intersection 
(see Sect.3.1); for instance one could have a disconnected union of two intervals 
such as (1, 2) U [3, 4], or one could consider the set [—1, 1] N Q of rational numbers 
between —1 and 1| inclusive. Clearly there are infinitely many possibilities of sets 
one could create by such operations. 

Just as sequences of real numbers have limit points, sets of real numbers have 
adherent points, which we now define. 


Definition 9.1.5 (e-adherent points) Let X bea subset of R, lete > 0,andletx € R. 
We say that x is e-adherent to X iff there exists a y € X which is €-close to x (i.e., 
|x — yl S €). 


Remark 9.1.6 The terminology “e-adherent” is not standard in the literature. How- 
ever, we shall shortly use it to define the notion of an adherent point, which is 
standard. 


Example 9.1.7 The point 1.1 is 0.5-adherent to the open interval (0, 1), but is not 
0.1-adherent to this interval (why?). The point 1.1 is 0.5-adherent to the finite set 
{1, 2, 3}. The point | is 0.5-adherent to {1, 2, 3} (why?). 


Definition 9.1.8 (Adherent points) Let X be a subset of R, and let x € R. We say 
that x is an adherent point of X iff it is e-adherent to X for every e > 0. 


Example 9.1.9 The number | is e-adherent to the open interval (0, 1) for every 
é€ > 0 (why?) and is thus an adherent point of (0, 1). The point 0.5 is similarly an 
adherent point of (0, 1). However, the number 2 is not 0.5-adherent (for instance) to 
(0, 1) and is thus not an adherent point to (0, 1). 


Definition 9.1.10 (Closure) Let X be a subset of R. The closure of X, sometimes 
denoted X is defined to be the set of all the adherent points of X. 


Lemma 9.1.11 (Elementary properties of closures) Let X and Y be arbitrary subsets 
of R. Then X CX, XUVY=XUY,andXNY CXNY.IfX CY, thenX CY. 


Proof See Exercise 9.1.1. 
We now compute some closures. 


Lemma 9.1.12 (Closures of intervals) Let a < b be real numbers, and let I be any 
one of the four intervals (a, b), (a, b], [a, b), or [a, b]. Then the closure of I is (a, b]. 
Similarly, the closure of (a, ©) or [a, 00) is [a, 00), while the closure of (—o, a) 
or (—&, a] is (—o, a]. Finally, the closure of (—o6©, &) is (—o, 00). 


Proof We will just show one of these facts, namely that the closure of (a, b) is [a, b]; 
the other results are proven similarly (or one can use Exercise 9.1.6). 
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First let us show that every element of [a, b] is adherent to (a, b). Let x € [a, b]. 
If x € (a, b), then it is definitely adherent to (a, b). If x = b, then x is also adherent 
to (a, b) (why?). Similarly when x = a. Thus every point in [a, b] is adherent to 
(a, b). 

Now we show that every point x that is adherent to (a, b) lies in [a, b]. Suppose 
for sake of contradiction that x does not lie in [a, b], then either x > b or x < a. If 
x > b then x is not (x — b)-adherent to (a, b) (why?) and is hence not an adherent 
point to (a, b). Similarly, if x < a, then x is not (a — x)-adherent to (a — b) and is 
hence not an adherent point to (a, b). This contradiction shows that x is in fact in 
[a, b] as claimed. 


Lemma 9.1.13 The closure of N is N. The closure of Z is Z. The closure of Q is R, 
and the closure of R is R. The closure of the empty set % is 9. 


Proof See Exercise 9.1.2. 


The following lemma shows that adherent points of a set X can be obtained as 
the limit of elements in X: 


Lemma 9.1.14 Let X be a subset of R, and let x € R. Then x is an adherent point 
of X if and only if there exists a sequence (aj)°°_9, consisting entirely of elements in 
X, which converges to x. 


Proof See Exercise 9.1.4. 


Definition 9.1.15 A subset E C R is said to be closed if E = E, or in other words 
that EF contains all of its adherent points. 


Examples 9.1.16 From Lemma 9.1.12 we see that if a < b are real numbers, then 
[a, b], [a, +00), (—co, a], and (—oo, +00) are closed, while (a, b), (a, b], [a, b), 
(a, +00), and (—oo, a) are not. From Lemma 9.1.13 we see that N, Z, R, @ are 
closed, while Q is not. 


From Lemma 9.1.14 we can define closure in terms of sequences: 


Corollary 9.1.17 Let X be a subset of R. If X is closed, and (ay)°°.9 is a convergent 
sequence consisting of elements in X, then limy_, oo dy also lies in X. Conversely, if 
it is true that every convergent sequence (d,)-°., of elements in X has its limit in X 
as well, then X is necessarily closed. 


When we study differentiation in the next chapter, we shall need to replace the 
concept of an adherent point by the closely related notion of a limit point. 


Definition 9.1.18 (Limit points) Let X be a subset of the real line. We say that x is a 
limit point (or a cluster point) of X iff it is an adherent point of X\{x}. We say that x 
is an isolated point of X if x € X and there exists some ¢ > 0 such that |x — y| > ¢€ 
for all y € X\{x}. 
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Example 9.1.19 Let X be the set X = (1, 2) U {3}. Then 3 is an adherent point of 
X, but it is not a limit point of X, since 3 is not adherent to X\{3} = (1, 2); instead, 
3 is an isolated point of X. On the other hand, 2 is still a limit point of X, since 2 is 
adherent to X\{2} = X; but it is not isolated (why?). 


Remark 9.1.20 From Lemma 9.1.14 we see that x is a limit point of X iff there 
exists a sequence (a, )°° 9, consisting entirely of elements in X that are distinct from 
x, and such that (a,)°° ) converges to x. It turns out that the set of adherent points 
splits into the set of limit points and the set of isolated points (Exercise 9.1.9). 


Lemma 9.1.21 Let I be an interval (possibly infinite), i.e., I is a set of the form 
(a, b), (a, b], [a, b), [a, b], (a, +00), [a, +00), (—00, a), or (—00, a], witha <b 
in the first four cases. Then every element of I is a limit point of I. 


Proof We show this for the case J = [a, b]; the other cases are similar and are left 
to the reader. Let x € J; we have to show that x is a limit point of 7. There are 
three cases: x =a, a <x <b, and x = b. If x =a, then consider the sequence 
(x + tye v- Lhis sequence converges to x and will lie inside I\{a} = (a, b] if N 
is chosen large enough (why?). Thus by Remark 9.1.20 we see that x = a is a limit 
point of [a, b]. A similar argument works when a < x < b. When x = b one has 
to use the sequence (x — 1) y instead (why?) but the argument is otherwise the 
same. 


Next, we define the concept of a bounded set. 


Definition 9.1.22 (Bounded sets) A subset X of the real line is said to be bounded 
if we have X C [—M, M] for some real number M > 0. A subset X of the real line 
is unbounded if it is not bounded. 


Example 9.1.23 For any real numbers a, b, the interval [a, b] is bounded, because it 
is contained inside [-M, M], where M := max(|a|, |b|). However, the half-infinite 
interval [0, +00) is unbounded (why?). In fact, no half-infinite interval or doubly 
infinite interval can be bounded. The sets N, Z, Q, and R are all unbounded (why?). 


A basic property of closed and bounded sets is the following. 


Theorem 9.1.24 (Heine—Borel theorem for the line) Let X be a subset of R. Then 
the following two statements are equivalent: 


(a) X is closed and bounded. 

(b) Given any sequence (ay) 7°.) of real numbers which takes values in X (i.e., dy € X 
for all n), there exists a subsequence (an, 0 of the original sequence, which 
converges to some number L in X. 


Proof See Exercise 9.1.13. 


Remark 9.1.25 This theorem shall play a key réle in subsequent sections of this 
chapter. In the language of metric space topology, it asserts that every subset of the 
real line which is closed and bounded and is also compact; see Sect. 1.5. A more 
general version of this theorem, due to Eduard Heine (1821-1881) and Emile Borel 
(1871-1956), can be found in Theorem 1.5.7. 
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— Exercises — 
Exercise 9.1.1 Prove Lemma 9.1.11. 


Exercise 9.1.2. Prove Lemma 9.1.13. (Hint: for computing the closure of Q, you will need Propo- 
sition 5.4.14.) 


Exercise 9.1.3 Give an example of two subsets X, Y of the real line such that X NY #4 XY. 


Exercise 9.1.4 Prove Lemma 9.1.14. (Hint: in order to prove one of the two implications here you 
will need axiom of choice, as in Lemma 8.4.5.) 


Exercise 9.1.5 Let X be a subset of R. Show that X is closed (ie. X =X). Furthermore, show 
that if Y is any closed set that contains X, then Y also contains X. Thus the closure X of X is the 
smallest closed set which contains X. 


Exercise 9.1.6 Let X be any subset of the real line, and let Y be a set such that X C Y C X. Show 
that Y = X. 


Exercise 9.1.7 Let n > | be a positive integer, and let X;,..., X, be closed subsets of R. Show 
that X; U X2 U...U Xy is also closed. 


Exercise 9.1.8 Let I be a non-empty set (possibly infinite), and for each a € J let Xq be a closed 
subset of R. Show that the intersection (\,<; Xq (defined in (3.3)) is also closed. 


Exercise 9.1.9 Let X be a subset of the real line. Show that every adherent point of X is either a 
limit point or an isolated point of X, but cannot be both. Conversely, show that every limit point 
and every isolated point of X is an adherent point of X. 


Exercise 9.1.10 Vf X is anon-empty subset of R, show that X is bounded if and only if inf(X) and 
sup(X) are finite. 


Exercise 9.1.11 Show that if X is a bounded subset of R, then the closure X is also bounded. 


Exercise 9.1.12 Show that the union of any finite collection of bounded subsets of R is still a 
bounded set. Is this conclusion still true if one takes an infinite collection of bounded subsets of R? 


Exercise 9.1.13 Prove Theorem 9.1.24. (Hint: to show (a) implies (b), use the Bolzano—Weierstrass 
theorem (Theorem 6.6.8) and Corollary 9.1.17. To show (b) implies (a), argue by contradiction, 
using Corollary 9.1.17 to establish that X is closed. You will need the axiom of choice to show that 
X is bounded, as in Lemma 8.4.5.) 


Exercise 9.1.14 Show that any finite subset of R is closed and bounded. 
Exercise 9.1.15 Let E be abounded non-empty subset of R, and let S := sup(£) be the least upper 


bound of £. (Note from the least upper bound principle, Theorem 5.5.9, that S is a real number.) 
Show that S is an adherent point of F and is also an adherent point of R\E. 
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9.2 The Algebra of Real- Valued Functions 


You are familiar with many functions f: R — R from the real line to the real 
line. Some examples are: f(x) := x7 43x +5; f@) := PGF +1); f(@):= 
sin(x) exp(x) (we will define sin and exp formally in Chap.4). These are functions 
from R to R since to every real number x they assign a single real number f(x). We 
can also consider more exotic functions, e.g., 


_flifxeQ 
PO sce. 


This function is not algebraic (i.e., it cannot be expressed in terms of x purely by 
using the standard algebraic operations of +, —, x, /, a/> etc.; we will not need this 
notion in this text), but it is still a function from R to R, because it still assigns a real 
number f(x) toeachx € R. 

We can take any one of the previous functions f: R — R defined on all of R, 
and restrict the domain to a smaller set X C R, creating a new function, sometimes 
called f|x, from X to R. This is the same function as the original function f, but is 
only defined on a smaller domain. (Thus f|x (x) := f(x) when x € X, and f|x(x) 
is undefined when x ¢ X.) For instance, we can restrict the function f(x) := x’, 
which is initially defined from R to R, to the interval [1,2], thus creating a new 
function f|f1,2} : [1,2] — R, which is defined as f|{1,2)(4) = x? when x € [1, 2] 
but is undefined elsewhere. 

One could also restrict the codomain from R to some smaller subset Y of R, 
provided of course that all the values of f(x) lie inside Y. For instance, the function 
f: R= R defined by f(x) := x? could also be thought of as a function from R to 
[0, 00), instead of a function from R to R. Formally, these two functions are different 
functions, but the distinction between them is so minor that we shall often be careless 
about the range of a function in our discussion. 

Strictly speaking, there is a distinction between a function f, and its value f (x) at 
a point x. f is a function; but f(x) is a number (which depends on some free variable 
x). This distinction is rather subtle and we will not stress it too much, but there are 
times when one has to distinguish between the two. For instance, if f: R > R is 
the function f (x) := x”, and g := f |j1,2 is the restriction of f to the interval [1, 2], 
then f and g both perform the operation of squaring, i.e., f(x) = x? and g(x) = x’, 
but the two functions f and g are not considered the same function, f 4 g, because 
they have different domains. Despite this distinction, we shall often be careless, 
and say things like “consider the function x? + 2x + 3” when really we should be 
saying “consider the function f: R — R defined by f(x) := x7 +2x +3”. (This 
distinction makes more of a difference when we start doing things like differentiation. 
For instance, if f: R — Ris the function f(x) = x’, then of course f (3) = 9, but 
the derivative of f at 3 is 6, whereas the derivative of 9 is of course 0, so we cannot 
simply “differentiate both sides” of f(3) = 9 and conclude that 6 = 0.) 
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If X is a subset of R, and f: X — R is a function, we can form the graph 
{(x, f(x)) : x € X} of the function f; this is a subset of X x R, and hence a subset 
of the Euclidean plane R* = R x R. One can certainly study a function through 
its graph, by using the geometry of the plane R? (e.g., employing such concepts as 
tangent lines, area, and so forth). We however will pursue a more “analytic” approach, 
in which we rely instead on the properties of the real numbers to analyze these 
functions. The two approaches are complementary; the geometric approach offers 
more visual intuition, while the analytic approach offers rigor and precision. Both 
the geometric intuition and the analytic formalism become useful when extending 
analysis of functions of one variable to functions of many variables (or possibly even 
infinitely many variables). 

Just as numbers can be manipulated arithmetically, so can functions: the sum of 
two functions is a function, the product of two functions is a function, and so forth. 


Definition 9.2.1 (Arithmetic operations on functions) Given two functions f: X > 
R and g : X — R, we can define their sum f + g : X — R by the formula 


(f + 8)(x) = f@) + 8), 
their difference f — g: X > R by the formula 
(f — g(x) := f@) — 8@), 
their maximum max(f, g) : X + R by 
max(f, g)(x) := max(f (x), g(x)), 
their minimum min(f, g) : X — R by 
min(f, g)(x) = min(f (x), g(x)), 
their product fg: X > R (or f -g: X > R) by the formula 
(fg)@) = f(x)g), 
and (provided that g(x) 4 0 forall x € X) the quotient f/g: X — R by the formula 
(f/8)(x) = f(x)/g(). 


Finally, if c is a real number, we can define the functioncf: X > R(orc: f: X > 
R) by the formula 
(cf)(a) = x fx). 


Example 9.2.2 If f: R — R is the function f(x) := x*, and g: R > R is the 
function g(x) := 2x, then f + g: R > R is the function (f + g)(x) := x? 4+ 2x, 
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while fg: R > R is the function fg(x) = 2x°. Similarly f — g: R > R is the 
function (f — g)(x) := x? — 2x, while 6f: R —> Ris the function (6 f)(x) = 6x7. 
Observe that fg is not the same function as f o g, which maps x +> 4x”, nor is it 
the same as go f, which maps x +> 2x? (why?). Thus multiplication of functions 
and composition of functions are two different operations. 


— Exercises — 


Exercise 9.2.1 Let f: R> R,g: R— R,h: R > R. Which of the following identities are true, 
and which ones are false? In the former case, give a proof; in the latter case, give a counterexample. 


(ft+g)oh=(foh)+(goh) 
fo(gth=(fog)t+(foh) 
(ft+tgy h=(f- H+: 
f-@+M=(F-gt+F-h) 


9.3. Limiting Values of Functions 


In Chap. 6 we defined what it means for a sequence (a,)°°,, to converge to a limit 
L. We now define a similar notion for what it means for a function f defined on the 
real line, or on some subset of the real line, to converge to some value at a point. Just 
as we used the notions of e-closeness and eventual e-closeness to deal with limits of 
sequences, we shall need a notion of ¢-closeness and local e-closeness to deal with 
limits of functions. 


Definition 9.3.1 (e-closeness) Let X be a subset of R, let f: X — R be a function, 
let L be a real number, and let ¢ > 0 be a real number. We say that the function f is 
é-close to L iff f (x) is e-close to L for every x € X. 


Example 9.3.2 When the function f (x) := x? is restricted to the interval [1, 3], then 
itis 5-close to 4, since when x € [1, 3] then1 < f(x) < 9,andhence| f(x) —4| <5. 
When instead it is restricted to the smaller interval [1.9, 2.1], then it is 0.41-close to 
4, since if x € [1.9, 2.1], then 3.61 < f(x) < 4.41, and hence | f(x) — 4| < 0.41. 


Definition 9.3.3 (Local e-closeness) Let X be a subset of R, let f: X > R bea 
function, let L be a real number, xo be an adherent point of X, and ¢ > 0 be a real 
number. We say that f is e-close to L near xo iff there exists a 5 > O such that f 
becomes ¢-close to L when restricted to the set {x € X : |x — xo| < 4}. 


Example 9.3.4 Let f: [1,3] > R be the function f(x) := x7, restricted to the 
interval [1, 3]. This function is not 0.1-close to 4, since for instance f (1) is not 0.1- 
close to 4. However, f is 0.1-close to 4 near 2, since when restricted to the set {x € 
[1,3] : |x — 2| < 0.01}, the function f is indeed 0.1-close to 4. This is because when 
|x — 2| < 0.01, we have 1.99 < x < 2.01, and hence 3.9601 < f(x) < 4.0401, and 
in particular f(x) is 0.1-close to 4. 
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Example 9.3.5 Continuing with the same function f used in the previous example, 
we observe that f is not 0.1-close to 9, since for instance f(1) is not 0.1-close to 
9. However, f is 0.1-close to 9 near 3, since when restricted to the set {x € [1,3]: 
|x — 3| < 0.01}—which is the same as the half-open interval (2.99, 3] (why?), the 
function f becomes 0.1-close to 9 (since if 2.99 < x < 3, then 8.9401 < f(x) <9, 
and hence f(x) is 0.1-close to 9). 


Definition 9.3.6 (Convergence of functions at a point) Let X be a subset of R, let 
f: X > R be a function, let E be a subset of X, x9 be an adherent point of E, 
and let L be a real number. We say that f converges to L at xo in E and write 
limy-+x9:xee f(x) = L, iff f, after restricting to FE, is e-close to L near x for every 
é > 0. If f does not converge to any number L at xo, we say that f diverges at xo, 
and leave lim,-,x):xee f(x) undefined. 


In other words, we have lim,-,.,:.ez f(x) = L iff for every ¢ > 0, there exists a 
6 > 0 such that | f(x) — L| < e for all x € E such that |x — xo| < 6. (Why is this 
definition equivalent to the one given above?) 


Remark 9.3.7 In many cases we will omit the set E from the above notation (i.e., 
we will just say that f converges to L at xo, or that lim,_,,, f(x) = L), although 
this is slightly dangerous. For instance, it sometimes makes a difference whether 
E actually contains xp or not. To give an example, if f: R— R is the func- 
tion defined by setting f(x) = 1 when x = 0 and f(x) = 0 when x ¥ 0, then one 
has lim,_,o;xer\{o) f(x) = 0, but lim,_,0.,er f(x) is undefined. Some authors only 
define the limit lim,_,,,:-,ez f(x) when E does not contain xg (so that xo is now a 
limit point of F rather than an adherent point), or would use lim,_,,,-.ex f(x) to 
denote what we would call lim,_, x): ,e£\{x9} f (x), but we have chosen a slightly more 
general notation, which allows the possibility that E contains x9. 


Example 9.3.8 Let f: [1,3] — Rbe the function f(x) := x”. We have seen before 
that f is 0.1-close to 4 near 2. A similar argument shows that f is 0.01-close to 4 
near 2 (one just has to pick a smaller value of 5). 


Definition 9.3.6 is rather unwieldy. However, we can rewrite this definition in 
terms of a more familiar one, involving limits of sequences. 


Proposition 9.3.9 Let X be a subset of R, let f: X — R be a function, let E be a 
subset of X, let x9 be an adherent point of E, and let L be a real number. Then the 
following two statements are logically equivalent: 


(a) f converges to L at xo in E. 
(b) For every sequence (ay,)°°., which consists entirely of elements of E and con- 
verges to Xo, the sequence (f (an))-°.9 converges to L. 


Proof See Exercise 9.3.1. 


In view of the above proposition, we will sometimes write “f(x) — Lasx —> xo 
in E” or “f has a limit L at xo in E” instead of “f converges to L at xo”, or 
“lim yx) f(x) = L”. 
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Remark 9.3.10 With the notation of Proposition 9.3.9, we have the following corol- 
lary: if limy—x9:xee f(x) = L, and limy_, oo dn = Xo, then limp. f(dn) = L. 


Remark 9.3.11 We only consider limits of a function f at xo in the case when x9 is 
an adherent point of E. When xo is not an adherent point then it is not worth it to 
define the concept of a limit. (Can you see why there will be problems?) 


Remark 9.3.12 The variable x used to denote a limit is a dummy variable; we could 
replace it by any other variable and obtain exactly the same limit. For instance, if 
limyx):xee f(x) = L, then limy_,,,.,ee f(y) = L, and conversely (why?). 


Proposition 9.3.9 has some immediate corollaries. For instance, we now know 
that a function can have at most one limit at each point: 


Corollary 9.3.13 Let X be a subset of R, let E be a subset of X, let x9 be an adherent 
point of E, and let f : X — R be a function. Then f can have at most one limit at 
Xo in E. 


Proof Suppose for sake of contradiction that there are two distinct numbers L and 
L’ such that f has a limit L at x9 in E, and such that f also has a limit L’ at xo 
in E. Since xo is an adherent point of E, we know by Lemma 9.1.14 that there is a 
sequence (a,,)°° ) consisting of elements in E which converges to x9. Since f has a 
limit L at xo in E, we thus see by Proposition 9.3.9, that (f (a,))°°.9 converges to L. 
But since f also has a limit L’ at xo in E, we see that (f (a,))°°.) also converges to 
L’. But this contradicts the uniqueness of limits of sequences (Proposition 6.1.7). 


Using the limit laws for sequences, one can now deduce the limit laws for func- 
tions: 


Proposition 9.3.14 (Limit laws for functions) Let X be a subset of R, let E bea 
subset of X, let xo be an adherent point of E, and let f: X > Randg: X ~R 
be functions. Suppose that f has a limit L at x in E, and g has a limit M at xo in 
E. Then f + g has a limit L+ M at xo in E, f — g has alimit L — M at xo in E, 
max(f, g) has a limit max(L, M) at xo in E, min(f, g) has a limit min(L, M) at xo 
in E and fg has a limit LM at xo in E. If c is a real number, then cf has a limit 
cL at xo in E. Finally, if g is non-zero on E (i..e., g(x) # 0 for all x € E) and M is 
non-zero, then f/g has a limit L/M at xo in E. 


Proof We just prove the first claim (that f + g has a limit L + M); the others are 
very similar and are left to Exercise 9.3.2. Let (a,)°29 be an arbitrary sequence 
of elements in E that converges to x9. Since f has a limit L at xp in E, we thus 
see from Proposition 9.3.9 that (f(a,))f29 converges to L. Similarly (g(a,))°9 
converges to M. By the limit laws for sequences (Theorem 6.1.19) we conclude 
that ((f + g)(dn))°29 converges to L + M. By Proposition 9.3.9 again, this implies 
that f + g has a limit L + M at xo in E as desired (since (a,)°°.) was an arbitrary 
sequence in E converging to xo). 
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Remark 9.3.15 One can phrase Proposition 9.3.14 more informally as saying that 


lim (f + g)(x) = lim f(x) + lim g(x) 


X>Xo 


lim max(f, g)(x) = max ( lim f(x), lim e)) 


lim min(f, g)(x) = min (im f(x), lim s()) 
lim (fg)(x) = lim f(x) lim g(x) 


limy x) FQ) 


limy— xy) g(x) 


Jim (f/8)() - 


(where we have dropped the restriction x € E for brevity) but bear in mind that these 
identities are only true when the right-hand side makes sense, and furthermore for the 
final identity we need g to be non-zero, and also lim,-_,,, g(x) to be non-zero. (See 
Example 1.2.4 for some examples of what goes wrong when limits are manipulated 
carelessly.) 


Using the limit laws in Proposition 9.3.14 we can already deduce several limits. 
First of all, it is easy to check the basic limits 


lim c=c 
xX—>x9;xER 


and 


lim x =xo 
x>x9;xER 


for any real numbers xo and c. (Why? Use Proposition 9.3.9.) By the limit laws we 
can thus conclude that 
lim x*=x 


x>x9;xER 


lim cx =cxo 
xX—>x0;xER 


lim x? +ex+d=x2+cex+d 


X>XQIXE 


etc., where c, d are arbitrary real numbers. 

If f converges to L at xo in X, and Y is any subset of X such that xo is still 
an adherent point of Y, then f will also converge to L at x9 in Y (why?). Thus 
convergence on a large set implies convergence on a smaller set. The converse, 
however, is not true: 


Example 9.3.16 Consider the signum function sgn: R — R, defined by 
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1 ifx>0 
sen(x):= 40 ifx=0 
—-1 ifx <0 
Then lim,-.0:xe(0,00) Sgn(x) = 1 (why?), whereas lim,-,9;+<(—00,0) = —1 (why?) and 


lim,-+0:xer Sgn(x) is undefined (why?). Thus it is sometimes dangerous to drop the 
set E from the notation of limit. However, in many cases itis safe to do so; for instance, 
since we know that lim,-_,x):+er r= Ma we know in fact that lim,-, .):+ex r= xe 
for any set X with xo as an adherent point (why?). Thus it is safe to write lim,-, ,, r= 


2 
Xo: 


Example 9.3.17 Let f (x) be the function 


1 ifx =0 
F@) ={0 ifx £0. 


Then lim,_,0;xer\(o) f(%) = 0 (why?), but lim,_.o.,er f(x) is undefined (why?). 
(When this happens, we say that f has a “removable singularity” or “removable 
discontinuity” at 0. Because of such singularities, it is sometimes the convention 
when writing lim,_,,, f(x) to automatically exclude xg from the set; for instance, in 
some textbooks, lim,_,,, f(x) is used as shorthand for limy-, x). xex\{xo} f(%).) 


On the other hand, the limit at xq should only depend on the values of the function 
near xq; the values away from xo are not relevant. The following proposition reflects 
this intuition: 


Proposition 9.3.18 (Limits are local) Let X be a subset of R, let E be a subset of 
X, let x9 be an adherent point of E, let f : X — R be a function, and let L be a real 
number. Let 5 > 0. Then we have 


lm f(@®=L 


X> x0; xXEE 


if and only if 
lim f@=L. 


xX— x9; XE EN(xo—5,x9+6) 


Proof See Exercise 9.3.3. 


Informally, the above proposition asserts that 


lim  f(x)= lim f(x). 


XX XEE XxX; XEEN(xX9—5,x9+6) 


Thus the limit of a function at xo, if it exists, only depends on the values of f near 
Xo; the values far away do not actually influence the limit. 
We now give a few more examples of limits. 
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Example 9.3.19 Consider the functions f: R— R and g: R— R defined by 
f (x):=x + 2and g(x) := x + 1. Thenlim,_,2..er f(x) = 4andlim,_,2.,er g(x) = 
3. We would like to use the limit laws to conde that lim,_.2-,er f(x*)/g(x) = 4/3, 
or in other words that lim,_,2-.er a +. Strictly speaking, we cannot use Propo- 
sition 9.3.14 to ensure this, because x + 1 is zero at x = —1, and so f(x)/g(x) is 
not defined. However, this is easily solved, by restricting the domain of f and g from 
R to a smaller domain, such [ R\{—1}. Then Proposition 9.3.14 does apply, and we 
have lim,_,2:,er\{-1) >= 


+1 = 


Example 9.3.20 Consider the function f: R\{1} — R defined by f(x) := (x? — 
1)/(x — 1). This function is well-defined for every real number except 1, so f(1) 
is undefined. However, | is still an adherent point of R\{1} (why?), and the limit 
lim, 1:xer—1} f (x) is still defined. This is because on the domain R\{1} we have the 
identity (x? — 1)/(e —1) = («#+ D@ —D/@—-1) =x +1, and limy—1-+eR-{1} 
x+1=2. 


Example 9.3.21 Let f: R > R be the function 


_ ifx€Q 
A to ifx ¢Q. 


We will show that f(x) has no limit at 0 in R. Suppose for sake of contradiction that 
Ff (x) had some limit LZ at 0 in R. Then we would have lim,_... f(a,) = L whenever 
(a,)°2_, is a sequence of non-zero numbers converging to 0. Since (1/n)°° , is such 
a sequence, we would have 


L= lim f(1/n) = lim 1=1. 
n—->oo n—-*0O 


On the other hand, since (/2/n)°, is another sequence of non-zero numbers con- 
verging to O—but now these numbers are irrational instead of rational—we have 


L= lim f(V2/n) = lim 0=0. 
n—-0o n—->oo 
Since | 4 0, we have a contradiction. Thus this function does not have a limit at 0. 


— Exercises — 
Exercise 9.3.1 Prove Proposition 9.3.9. 
Exercise 9.3.2. Prove the remaining claims in Proposition 9.3.14. 
Exercise 9.3.3 Prove Proposition 9.3.18. 
Exercise 9.3.4 Propose a definition for limit superior lim sup,-,y).ye~ f(«) and limit inferior 


lim inf y_, 9:xee f(x), and then propose an analogue of Proposition 9.3.9 for your definition. (For 
an additional challenge: prove that analogue.) 
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Exercise 9.3.5 (Continuous version of squeeze test) Let X be a subset of R, let E be a subset of 
X, let xo be an adherent point of FE, and let f: X > R, g: X > R,h: X — R be functions such 
that f(x) < g(x) < A(x) for all x € E. If we have limy+19:xer f(%) = limysxp:xex A(x) = L 
for some real number L, show that lim,_,y)-,ex g(x) = L. 


9.4 Continuous Functions 


We now introduce one of the most fundamental notions in the theory of functions - 
that of continuity. 


Definition 9.4.1 (Continuity) Let X be a subset of R, and let f: X — R bea func- 
tion. Let x9 be an element of X. We say that f is continuous at xo iff we have 


lim $0) = fo): 


X>X03X 


in other words, the limit of f(x) as x converges to x9 in X exists and is equal to 
F (xo). We say that f is continuous on X (or simply continuous) iff f is continuous 
at x9 for every x9 € X. We say that f is discontinuous at xo iff it is not continuous 
at Xo. 

We also extend these notions to functions f: X — Y that take values in a subset 
Y of R, by identifying such functions (by abuse of notation) with the function i : 
X — R that agrees everywhere with f (so a (x) = f(x) for all x € X) but where 
the codomain has been enlarged from Y to R. 


Example 9.4.2. Let c be a real number, and let f : R — R be the constant function 
f(x) := c. Then for every real number xo € R, we have 


lim f@)=, 


X>XQiXE 


lim c=c= f(x), 
>x9;xER 


thus f is continuous at every point xo € R, or in other words f is continuous on R. 


Example 9.4.3 Let f: R — R be the identity function f(x) := x. Then for every 
real number xo € R, we have 


lim f(«~)= lim _ x=x9= f(x), 
x>x9;xER xoeEx;xER 


thus f is continuous at every point x9 € R, or in other words f is continuous on R. 


Example 9.4.4 Letsgn: R > R be the signum function defined in Example 9.3.16. 
Then sgn(x) is continuous at every non-zero value of x; for instance, at 1, we have 
(using Proposition 9.3.18) 
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ee sen) = se cay sen) 
a lim 1 
x—1;x€(0.9,1.1) 
=1 
= sgn(1). 


On the other hand, sgn is not continuous at 0, since the limit lim,-_,9:,eR sgn(x) does 
not exist. 


Example 9.4.5 Let f: R > R be the function 


_ fl ifxeQ 
oe ein 


Then by the discussion in the previous section, f is not continuous at 0. In fact, it 
turns out that f is not continuous at any real number xo (can you see why?). 


Example 9.4.6 Let f: R > R be the function 


i ifs 0 
fee ret 


Then f is continuous at every non-zero real number (why?), but is not continuous 
at 0. However, if we restrict f to the right-hand line [0, oo), then the resulting func- 
tion f|[0,00) now becomes continuous everywhere in its domain, including 0. Thus 
restricting the domain of a function can make a discontinuous function continuous 
again. 


There are several ways to phrase the statement that “f is continuous at x9”: 


Proposition 9.4.7 (Equivalent formulations of continuity) Let X be a subset of R, 
let f : X — R be a function, and let xo be an element of X. Then the following four 
statements are logically equivalent: 


(a) f is continuous at xo. 
(b) For every sequence (ay)p°.9 consisting of elements of X with limp.oo Gn = Xo, 


we have limn-+oo f (an) = f (x0). 
(c) For every ¢ > 0, there existsadé > 0 such that | f (x) — f(xo)| < ¢forallx € X 


with |x — xo| < 6. 
(d) For every e > 0, there existsaé > 0 such that | f(x) — f(xo)| < ¢ forallx € X 
with |x — xo| < 6. 


Proof See Exercise 9.4.1. 


Remark 9.4.8 A particularly useful consequence of Proposition 9.4.7 is the fol- 
lowing: if f is continuous at x9, and a, — xp asin — oo, then f(a,) > f (xo) as 
n — oo (provided that all the elements of the sequence (a,)°°.9 lie in the domain of 
f, of course). Thus continuous functions are very useful in computing limits. 
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The limit laws in Proposition 9.3.14, combined with the definition of continuity 
in Definition 9.4.1, immediately imply 


Proposition 9.4.9 (Arithmetic preserves continuity) Let X be a subset of R, and 
let f: X — Rand g: X > R be functions. Let xp € X. Then if f and g are both 
continuous at Xo, then the functions f + g, f — g, max(f, g), min(f, g) and fg are 
also continuous at xq. If g is non-zero on X, then f/g is also continuous at Xo. 


In particular, the sum, difference, maximum, minimum, and product of continuous 
functions are continuous; and the quotient of two continuous functions is continuous 
as long as the denominator does not become zero. 

One can use Proposition 9.4.9 to show that a lot of functions are continuous. 
For instance, just by starting from the fact that constant functions are continuous, 
and the identity function f(x) = x is continuous (Exercise 9.4.2), one can show 
that the function g(x) := max(x? + 4x7 +x%+5,x4— x) iQ" — 4), for instance, 
is continuous at every point of R except the two points x = +2, x = —2 where the 
denominator vanishes. 

Some other examples of continuous functions are given below. 


Proposition 9.4.10 (Exponentiation is continuous, I) Let a > 0 be a positive real 
number. Then the function f : R — R defined by f (x) := a“ is continuous. 


Proof See Exercise 9.4.3. 


Proposition 9.4.11 (Exponentiation is continuous, II) Let p be a real number. Then 
the function f : (0, 00) — R defined by f (x) := x? is continuous. 


Proof See Exercise 9.4.4. 


There is a stronger statement than Propositions 9.4.10 and 9.4.11, namely that 
exponentiation is jointly continuous in both the exponent and the base, but this is 
harder to show; see Exercise 4.5.10. 


Proposition 9.4.12 (Absolute value is continuous) The function f : R > R defined 
by f (x) := |x| is continuous. 


Proof This follows since |x| = max(x, —x) and the functions x, —x are already 
continuous. 


The class of continuous functions is not only closed under addition, subtraction, 
multiplication, and division, but is also closed under composition: 


Proposition 9.4.13 (Composition preserves continuity) Let X and Y be subsets of 
R, and let f: X — Y and g: Y > R be functions. Let xo be a point in X. If f is 
continuous at Xo, and g is continuous at f (xo), then the composition go f: X > R 
is continuous at Xo. 


Proof See Exercise 9.4.5. 
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Example 9.4.14 Since the function f(x) := 3x + 1 is continuous on all of R, and 
the function g(x) := 5* is continuous on all of R, the function go f(x) = 5°*+! is 
continuous on all of R. By several applications of the above propositions, one can 
show that far more complicated functions, e.g., h(x) := |x? — 8x + TY? (x2 +1), 
are also continuous. (Why is this function continuous?) There are still a few functions 
though that are not yet easy to test for continuity, such as k(x) := x*; this function 
can be dealt with more easily once we have the machinery of logarithms, which we 
will see in Sect.4.5 of Analysis II. 


— Exercises — 


Exercise 9.4.1 Prove Proposition 9.4.7. (Hint: this can largely be done by applying the previous 
propositions and lemmas. Note that to prove (a),(b), and (c) are equivalent, you do not have to prove 
all six implications, but you do have to prove at least three; for instance, showing that (a) implies 
(b), (b) implies (c), and (c) implies (a) will suffice, although this is not necessarily the shortest or 
simplest way to do this question.) 


Exercise 9.4.2 Let X be a subset of R, and let c € R. Show that the constant function f: X > R 
defined by f(x) :=c is continuous, and show that the identity function g: X — R defined by 
g(x) := x is also continuous. 


Exercise 9.4.3 Prove Proposition 9.4.10. (Hint: you can use Lemma 6.5.3, combined with the 
squeeze test (Corollary 6.4.14) and Proposition 6.7.3.) 


Exercise 9.4.4 Prove Proposition 9.4.11. (Hint: from limit laws (Proposition 9.3.14) one can show 
that lim,_,; x” = | for all integers n. From this and the squeeze test (Corollary 6.4.14) deduce that 
lim,—,; x? = 1 for all real numbers p. Finally, apply Proposition 6.7.3.) 


Exercise 9.4.5 Prove Proposition 9.4.13. 


Exercise 9.4.6 Let X be asubset of R, and let f: X — R be acontinuous function. If Y is a subset 
of X, show that the restriction f|y : Y — R of f to Y is also a continuous function. (Hint: this is 
a simple result, but it requires you to follow the definitions carefully.) 


Exercise 9.4.7, Let n > 0 be an integer, and for each 0 <i <n let c; be a real number. Let P : 
R = R be the function 


n 


P(x) := SD cx! 


i=0 


such a function is known as a polynomial of one variable; a typical example is P(x) = 6x4 — 3x? + 
4. Show that P is continuous. 


9.5 Left and Right Limits 


We now introduce the notion of left and right limits, which can be thought of as two 
seperate “halves” of the complete limit lim,-,,)-xex f(x). 


Definition 9.5.1 (Left and right limits) Let X be a subset of R, f: X > R bea 
function, and let x9 be a real number. If xo is an adherent point of X M (xo, oo), then 
we define the right limit f (xo+) of f at xo by the formula 
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fot) := lim f(x), 


X—> X93 XEXN(X,00) 


provided of course that this limit exists. Similarly, if x9 is an adherent point of 
X M(—o, xo), then we define the left limit f (xo—) of f at xo by the formula 


f (x0-) = lim f(x), 


X>X93XEXN(—0O, Xo) 


again provided that the limit exists. (Thus in many cases f(x9+) and f(xo—) will 
not be defined.) 


Sometimes we use the shorthand notations 


lim f(x) := lim F(x); 
X—>Xo+ X>x9;XEXN(X, 00) 

lim f(x) := lim f(x) 
X>X9- X>X9;XEXN(—0O,X0) 


when the domain X of f is clear from context. 


Example 9.5.2 Consider the signum function sgn: R > R defined in Example 
9.3.16. We have 


sgn(0+) = lim sgn(x) = lim ea 
x—>0;xERN(0,00) x—0;xERN(0,00) 
and 
-)= li = li Si 
sen(0 ) Spent sgn) pS 0xeRoo.0) : 


while sgn(0) = 0 by definition. 


Note that f does not necessarily have to be defined at xq in order for f (x9+) or 
Ff (xo—) to be defined. For instance, if f: R\{0O} — Ris the function f(x) := x/|x|, 
then f(0+) = 1 and f(0O—) = —1 (why?), even though f (0) is undefined. 

From Proposition 9.3.9 we see that if the right limit f(x9+) exists, and (a,)°°9 
is a sequence in X converging to xo from the right (i.e., a, > xo for all n € N), then 
limy—soo f (Gn) = f (xo+). Similarly, if (b,)°24 is a sequence converging to x9 from 
the left (i.e., b, < xo for all nm € N), then lim, ..~ f(b,) = f(%o-). 

Let xo be an adherent point of both XM (xo, oo) and X M (—oo, xo). If f is 
continuous at xo, it is clear from Proposition 9.4.7 that f(x9+) and f(xo—) both 
exist and are equal to f (x9). (Can you see why?) A converse is also true (compare 
this with Proposition 6.4. 12f): 


Proposition 9.5.3 Let X be a subset of R containing a real number xo, and suppose 
that xo is an adherent point of both X N (xo, ©) and X MN (—o~, xo). Let f: X ~ R 
be a function. If f (xo+) and f (xp—) both exist and are both equal to f (xo), then f 
is continuous at Xo. 
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Proof Let us write L := f (xo). Then by hypothesis we have 


lim f@M=L (9.1) 


X>xX9;XEXN(X, 00) 


and 
lim f@=L. (9.2) 


XX; XEXN(—CO, Xo) 


Let ¢ > 0 be given. From (9.1), Definition 9.3.6, and Definition 9.3.3 (applied to 
the restriction of f to X M (xo, +00)), we know that there exists a 64 > O such that 
| f(x) — L| < « for all x € XM (x0, o©) for which |x — xo| < 6,. From (9.2) we 
similarly know that there exists a 6. > O such that | f(x) — L| < e forallx e XN 
(—00, xo) for which |x — xo| < 6_. Now let 6 := min(6_, 6,); then 6 > 0 (why?), 
and suppose that x € X is such that |x — x9| < 5. Then there are three cases: x > Xo, 
x = Xo, and x < x9, but in all three cases we know that | f(x) — L| < ¢. (Why? The 
reason is different in each of the three cases.) By Proposition 9.4.7 we thus have that 
f is continuous at xo, as desired. 


As we saw with the signum function in Example 9.3.16, it is possible for the left 
and right limits f(xo—), f(xo+) of a function f at a point xo to both exist, but not 
be equal to each other; when this happens, we say that f has a jump discontinuity at 
xo. Thus, for instance, the signum function has a jump discontinuity at zero. Also, it 
is possible for the left and right limits f(x9—), f(xo+) to exist and be equal each 
other, but not be equal to f(x); when this happens we say that f has a removable 
discontinuity (or removable singularity) at xo. For instance, if we take f: R-~ R 


to be the function 
1 ifx=0 
Oia 


then f (0+) and f(0—) both exist and equal 0 (why?), but f(0) equals 1; thus f has 
a removable discontinuity at 0. 


Remark 9.5.4 Jump discontinuities and removable discontinuities are not the only 
way a function can be discontinuous. Another way is for a function to go to infinity at 
the discontinuity: for instance, the function f: R\{0} — R defined by f(x) := 1/x 
has a discontinuity at 0 which is neither a jump discontinuity or a removable singu- 
larity; informally, f(x) converges to --oo when x approaches 0 from the right and 
converges to —oo when x approaches 0 from the left. These types of singularities 
are sometimes known as asymptotic discontinuities. There are also oscillatory dis- 
continuities, where the function remains bounded but still does not have a limit near 
Xo. For instance, the function f: R — R defined by 


_ fl ifxeQ 
f= ogc 
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has an oscillatory discontinuity at 0 (and in fact at any other real number also). This 
is because the function does not have left or right limits at 0, despite the fact that the 
function is bounded. 

The study of discontinuities (also called singularities) continues further, but is 
beyond the scope of this text. For instance, singularities play a key role in complex 
analysis. 


— Exercises — 


Exercise 9.5.1 Let E be a subset of R, let f: E — R bea function, and let xo be an adherent point 
of E. Write down a definition of what it would mean for the limit lim,_,.)-,eg f(x) to exist and 
equal +00 or —oo. If f: R\{0} — Ris the function f(x) := 1/x, use your definition to conclude 
f (0+) = +00 and f(0O—) = —oo. Also, state and prove some analogue of Proposition 9.3.9 when 
L=+oorL=-—o. 


9.6 The Maximum Principle 


In the previous two sections we saw that a large number of functions were continuous, 
though certainly not all functions were continuous. We now show that continuous 
functions enjoy a number of other useful properties, especially if their domain is a 
closed interval. It is here that we shall begin exploiting the full power of the Heine— 
Borel theorem (Theorem 9.1.24). 


Definition 9.6.1 Let X be a subset of R, and let f: X — R be a function. We say 
that f is bounded from above iff there exists a real number M such that f(x) < M 
for all x € X. We say that f is bounded from below iff there exists a real number M 
such that f(x) > —M for all x € X. We say that f is bounded iff there exists a real 
number M such that | f(x)| < M for all x € X. 


Remark 9.6.2 A function is bounded if and only if it is bounded both from above 
and below. (Why? Note that one part of the “if and only if” is slightly trickier than 
the other.) Also, a function f: X — R is bounded if and only if its image f(X) is 
a bounded set in the sense of Definition 9.1.22 (why?). 


Not all continuous functions are bounded. For instance, the function f(x) := x 
on the domain R is continuous but unbounded (why?), although it is bounded on 
some smaller domains, such as [1, 2]. The function f(x) := 1/x is continuous but 
unbounded on (0, 1) (why?), though it is continuous and bounded on [1, 2] (why?). 
However, if the domain of the continuous function is a closed and bounded interval, 
then we do have boundedness: 


Lemma 9.6.3 Let a < b be real numbers, and let f: [a,b] > R be a function 
continuous on [a, b]. Then f is a bounded function. 


Proof Suppose for sake of contradiction that f is not bounded. Thus for every real 
number M there exists an element x € [a, b] such that | f(x)| > M. 
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In particular, for every natural number n, the set {x € [a, b] : | f(x)| => } is non- 
empty. We can thus choose” a sequence (x;,)p°.9 in[a, b] such that | f (x,)| > 1 for all 
n. This sequence lies in [a, b], and so by Theorem 9.1.24 there exists a subsequence 
(Xn; )FZ0 which converges to some limit L € [a, b], where np < nj <n2 <... is 
an increasing sequence of natural numbers. In particular, we see that n; > j for all 
Jj € N (why? Use induction). 

Since f is continuous on [a, b], itis continuous at L, and in particular we see that 


Jim, f n,) = f(L). 


Thus the sequence (f (xp, jo is convergent, and hence it is bounded. On the other 
hand, we know from the construction that | f (Xn, )| = nj = j for all j, and hence the 
sequence (f (Xn, )FR0 is not bounded, a contradiction. 


Remark 9.6.4 There are two things about this proof that are worth noting. Firstly, it 
shows how useful the Heine—Borel theorem (Theorem 9.1.24) is. Secondly, it is an 
indirect proof; it doesn’t say how to find the bound for f, but it shows that having f 
unbounded leads to a contradiction. 


We now improve Lemma 9.6.3 to say something more. 


Definition 9.6.5 (Maxima and minima) Let X bea set, let f: X — R bea function, 
and let x9 € X. We say that f attains its maximum at xo if we have f(x) => f(x) 
for all x € X (ie., the value of f at the point xo is larger than or equal to the value 
of f at any other point in X). We say that f attains its minimum at xo if we have 


f(%o) = Ff). 


Remark 9.6.6 \fafunction attains its maximum somewhere, then it must be bounded 
from above (why?). Similarly if it attains its minimum somewhere, then it must be 
bounded from below. These notions of maxima and minima are global; local versions 
will be defined in Definition 10.2.1. 


Proposition 9.6.7 (Maximum principle) Leta < bbe realnumbers, and let f : [a, b] 
— R be a function continuous on [a, b]. Then f attains its maximum at some point 
Xmax © [a, b] and also attains its minimum at some point Xin € [a, b]. 


Remark 9.6.8 Strictly speaking, “maximum principle” is a misnomer, since the prin- 
ciple also concerns the minimum. Perhaps a more precise name would have been 
“extremum principle”; the word “extremum” is used to denote either a maximum or 
a minimum. 


Proof We shall just show that f attains its maximum somewhere; the proof that it 
attains its minimum also is similar but is left to the reader. 


? Strictly speaking, this requires the axiom of choice, as in Lemma 8.4.5. However, one can also 
proceed without the axiom of choice, by defining x, := sup{x € [a, b] : | f(x)| = n}, and using the 
continuity of f to show that | f(x,)| => n. We leave the details to the reader. 
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From Lemma 9.6.3 we know that f is bounded, thus there exists an M such that 
—M < f(x) < M foreach x € [a, b]. Now let E denote the set 


E:={f(x):x € [a, b]}. 


(In other words, E := f({a, b]).) By what we just said, this set is a subset of 
[—M, M]. It is also non-empty, since it contains for instance the point f(a). Hence 
by the least upper bound principle, it has a supremum sup(£) which is a real number. 

Write m := sup(£). By definition of supremum, we know that y < m for all 
y € E; by definition of E, this means that f(x) < m forall x € [a, b]. Thus to show 
that f attains its maximum somewhere, it will suffice to find an x4, € [a,b] such 
that f (max) = m. (Why will this suffice?) 

Let n > | be any integer. Then m — 7 < m= sup(£). As sup(£) is the least 
upper bound for Em — 1 cannot be an upper bound for F, thus there exists a y € E 
such that m — 1 < y. By definition of EF, this implies that there exists an x € [a, b] 
such that m — i < f(x). 

We now choose a sequence (x,,)°° , by choosing, for each n, x, to be an element of 
[a, b] such that m — 1 < f (xn). (Again, this requires the axiom of choice; however it 
is possible to prove this principle without the axiom of choice. For instance, you will 
see a better proof of this proposition using the notion of compactness in Proposition 
2.3.2.) This is a sequence in [a, b]; by the Heine—Borel theorem (Theorem 9.1.24), 
we can thus find a subsequence (x, par where n; < nz <..., which converges to 
some limit Xnax € [a, b]. Since (Xn;) je COMVerges tO Xmax; and f is continuous at 
Xmax, We have as before that 


a f (nj) = f max). 


On the other hand, by construction we know that 


1 1 
[Gum —2m—-, 
nj J 
and so by taking limits of both sides we see that 
f max) = lim f %n;) = lim m-- |>=mM. 
jroo jroo J 


On the other hand, we know that f(x) <m for all x € [a,b], so in particu- 
lar f (Xmax) <m. Combining these two inequalities we see that f(Xmax) =m as 
desired. 


Note that the maximum principle does not prevent a function from attaining its 
maximum or minimum at more than one point. For instance, the function f (x) := x? 
on the interval [—2, 2] attains its maximum at two different points, at —2 and at 2. 
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Let us write sup, ¢q,,) f (*) as shorthand for sup{ f(x) : x € [a, b]}, and similarly 
define inf <[a,) f (x). The maximum principle thus asserts that m := sup, ¢a,5} f (*) 
is a real number and is the maximum value of f on [a, b]; i.e., there is at least one 
point Xnax in [a, b] for which f (Xmax) = m, and for every other x € [a, b], f(x) is 
less than or equal to m. Similarly inf,¢ja,p) f(x) is the minimum value of f on [a, b]. 

We now know that on a closed interval, every continuous function is bounded and 
attains its maximum at least once and minimum at least once. The same is not true 
for open or infinite intervals; see Exercise 9.6.1. 


Remark 9.6.9 You may encounter a rather different “maximum principle” in com- 
plex analysis or partial differential equations, involving analytic functions and har- 
monic functions, respectively, instead of continuous functions. Those maximum prin- 
ciples are not directly related to this one (though they are also concerned with whether 
maxima exist, and where the maxima are located). 


— Exercises — 


Exercise 9.6.1 Give examples of 


(a) afunction f: (1,2) — R which is continuous and bounded, attains its minimum somewhere, 
but does not attain its maximum anywhere; 

(b) a function f: [0, 00) — R which is continuous, bounded, attains its maximum somewhere, 
but does not attain its minimum anywhere; 

(c) a function f: [—1, 1] — R which is bounded but does not attain its minimum anywhere or 
its maximum anywhere. 

(d) afunction f: [—1, 1] — R which has no upper bound and no lower bound. 


Explain why none of the examples you construct violate the maximum principle. (Note: read the 
assumptions of that principle carefully!) 


Exercise 9.6.2 If f, g: X — Rare bounded functions, show that f + g, f — g, and f - g are also 
bounded functions. If we furthermore assume that g(x) 4 0 for all x € X, is it true that f/g is 
bounded? Prove this or give a counterexample. 


9.7. The Intermediate Value Theorem 


We have just shown that a continuous function attains both its maximum value and 
its minimum value. We now show that f also attains every value in between. To do 
this, we first prove a very intuitive theorem: 


Theorem 9.7.1 (Intermediate value theorem) Let a < b, and let f : [a,b] > R be 
a continuous function on [a, b]. Let y be a real number between f (a) and f (b), i.e., 
either f(a) < y < f(b) or f(a) = y = f(b). Then there exists c € [a, b] such that 
fO=y. 


Proof We have two cases: f(a) < y < f(b) or f(a) = y => f(b). We will assume 
the former, that f(a) < y < f(b); the latter is proven similarly and is left to the 
reader. 
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If y = f(a) or y = f(b), then the claim is easy, as one can simply set c = a or 
c = b, so we will assume that f(a) < y < f(b). Let E denote the set 


E := {x € [a,b]: f(x) < y}. 


Clearly E is a subset of [a, b] and is hence bounded. Also, since f(a) < y, we see 
that a is an element of E, so E is non-empty. By the least upper bound principle, the 
supremum 

c i= sup(E) 


is thus finite. Since EF is bounded by b, we know that c < b; since E contains a, we 
know that c > a. Thus we have c € [a, b]. To complete the proof we now show that 
Ff (c) = y. The idea is to work from the left of c to show that f(c) < y and to work 
from the right of c to show that f(c) > y. 

Let n > 1 be an integer. The number c — + is less than c = sup(£) and hence 
cannot be an upper bound for E. Thus there exists a point, call it x,, which lies in EF 
and which is greater than c — I. Also x, < c since c is an upper bound for FE. Thus 


C=] xX, = c. 
n 
By the squeeze test (Corollary 6.4.14) we thus have limy_,o. x, = c. Since f is 
continuous at c, this implies that lim, f(%,) = f(c). But since x, lies in E' for 
every n, we have f (x,) < y foreveryn. By the comparison principle (Lemma 6.4.13) 
we thus have f(c) < y. Since f(b) > f(c), we conclude c 4 b. 

Since c £ b and c € [a, b], we must have c < b. In particular there is an N > 0 
such that c + 7 <b for all n > N (since c + i converges to c aS nm —> 00). Since c 
is the supremum of FE and c + 1 > c, we thus have c + 1 ¢ E foralln > N. Since 
c+ 4 € [a, b], we thus have f(c + +) > yforalln > N.Butc+ 1 converges to c, 
and f is continuous at c, thus f(c) > y. But we already knew that f(c) < y, thus 
Ff (c) = y, as desired. 


The intermediate value theorem says that if f takes the values f(a) and f(b), 
then it must also take all the values in between. Note that if f is not assumed to be 
continuous, then the intermediate value theorem no longer applies. For instance, if 
f:{-1, 1] — R is the function 


—1 ifx <0 
jy ifx >0 
then f(—1) = —1, and f(1) = 1, but there is no c € [—1, 1] for which f(c) = 0. 
Thus if a function is discontinuous, it can “jump” past intermediate values; however 
continuous functions cannot do so. 


Remark 9.7.2, A continuous function may take an intermediate value multiple times. 
For instance, if f: [—2, 2] — R is the function f(x) := x? — x, then f(—2) = -6 
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and f (2) = 6, so we know that there exists ac € [—2, 2] for which f(c) = 0. In fact, 
in this case there exists three such values of c: we have f(—1) = f(0) = f(1) = 0. 


Remark 9.7.3. The intermediate value theorem gives another way to show that one 
can take n“” roots of anumber. For instance, to construct the square root of 2, consider 
the function f : [0,2] — R defined by f(x) = x”. This function is continuous, with 
f(O) = Oand f (2) = 4. Thus there exists ac € [0, 2] suchthat f(c) = 2,ie.,c? = 2. 
(This argument does not show that there is just one square root of 2, but it does prove 
that there is at least one square root of 2.) 


Corollary 9.7.4 (Images of continuous functions) Leta < b, andlet f: [a,b] > R 
be acontinuous function on [a, b]. Let M := sup, ¢ta,p f (x) be the maximum value of 
f, and letm := inf ye[a,5) f (x) be the minimum value. Let y be a real number between 
m and M (i.e, m < y < M). Then there exists a c € [a,b] such that f(c) = y. 
Furthermore, we have f ({a, b]) = [m, M]. 


Proof See Exercise 9.7.1. 


— Exercises — 


Exercise 9.7.1 Prove Corollary 9.7.4. (Hint: you may need Exercise 9.4.6 in addition to the inter- 
mediate value theorem.) 


Exercise 9.7.2. Let f : [0, 1] — [0, 1] be a continuous function. Show that there exists a real num- 
ber x in [0, 1] such that f(x) = x. (Hint: apply the intermediate value theorem to the function 
f(x) — x.) This point x is known as a fixed point of f, and this result is a basic example of a fixed 
point theorem, which play an important rdle in certain types of analysis. 


9.8 Monotonic Functions 


We now discuss a class of functions which is distinct from the class of continuous 
functions, but has somewhat similar properties: the class of monotone (or monotonic) 
functions. 


Definition 9.8.1 (Monotonic functions) Let X beasubset of R, andlet f: X > Rbe 
a function. We say that f is monotone increasing iff f(y) > f(x) wheneverx, y € X 
and y > x. We say that f is strictly monotone increasing iff f(y) > f(x) whenever 
x,y € X and y > x. Similarly, we say f is monotone decreasing iff f(y) < f(x) 
whenever x, y € X and y > x, and strictly monotone decreasing iff f(y) < f(x) 
whenever x, y € X and y > x. We say that f is monotone if it is monotone increasing 
or monotone decreasing, and strictly monotone if it is strictly monotone increasing 
or strictly monotone decreasing. 


Examples 9.8.2. The function f(x) := x, when restricted to the domain [0, oo), 
is strictly monotone increasing (why?), but when restricted instead to the domain 
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(—oo, 0], is strictly monotone decreasing (why?). Thus the function is strictly mono- 
tone on both (—oo, 0] and [0, oo), but is not strictly monotone (or monotone) on the 
full real line (—0o, oo). Note that if a function is strictly monotone on a domain X, 
it is automatically monotone as well on the same domain X. The constant function 
f (x) := 6, when restricted to an arbitrary domain X C R, is both monotone increas- 
ing and monotone decreasing, but is not strictly monotone (unless X consists of at 
most one point - why?). 


Continuous functions are not necessarily monotone (consider for instance the 
function f(x) = x? on R), and monotone functions are not necessarily continuous; 
for instance, consider the function f: [—1, 1] > R defined earlier by 


—1 ifx <0 
f=) ifx > 0. 
Monotone functions obey the maximum principle (Exercise 9.8.1), but not the inter- 
mediate value principle (Exercise 9.8.2). On the other hand, it is possible for a 
monotone function to have many, many discontinuities (Exercise 9.8.5). 
If a function is both strictly monotone and continuous, then it has many nice 
properties. In particular, it is invertible: 


Proposition 9.8.3. Leta < b be real numbers, and let f : [a,b] > R be a function 
which is both continuous and strictly monotone increasing. Then f is a bijection 
from [a,b] to [f(a), f(b)], and the inverse f~':[f(@), f(b)] > [a,b] is also 


continuous and strictly monotone increasing. 


Proof See Exercise 9.8.4. 


There is a similar Proposition for functions which are strictly monotone decreas- 
ing; see Exercise 9.8.4. 


Example 9.8.4 Letn bea positive integer and R > 0. Since the function f(x) := x” 
is strictly increasing on the interval [0, R], we see from Proposition 9.8.3 that this 
function is a bijection from [0, R] to [0, R”], and hence there is an inverse from 
[0, R”] to [0, R]. This can be used to give an alternate means to construct the n‘ 
root x!/” of a number x € [0, R] than what was done in Lemma 5.6.5. 


— Exercises — 


Exercise 9.8.1 Explain why the maximum principle remains true if the hypothesis that f is con- 
tinuous is replaced with f being monotone, or with f being strictly monotone. (You can use the 
same explanation for both cases.) 


Exercise 9.8.2. Give an example to show that the intermediate value theorem becomes false if 
the hypothesis that f is continuous is replaced with f being monotone, or with f being strictly 
monotone. (You can use the same counterexample for both cases.) 
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Exercise 9.8.3 Let a < b be real numbers, and let f: [a,b] — R be a function which is both 
continuous and one-to-one. Show that f is strictly monotone. (Hint: divide into the three cases 
f(a) < f(b), f@ = f(b), f(a) > f(b). The second case leads directly to a contradiction. In the 
first case, use contradiction and the intermediate value theorem to show that f is strictly monotone 
increasing; in the third case, argue similarly to show f is strictly monotone decreasing.) 


Exercise 9.8.4 Prove Proposition 9.8.3. (Hint: to show that f—! is continuous, it is easiest to use 
the “epsilon-delta” definition of continuity, Proposition 9.4.7c.) Is the proposition still true if the 
continuity assumption is dropped, or if strict monotonicity is replaced just by monotonicity? How 
should one modify the proposition to deal with strictly monotone decreasing functions instead of 
strictly monotone increasing functions? 


Exercise 9.8.5 In this exercise we give an example of a function which has a discontinuity at every 
rational point, but is continuous at every irrational. Since the rationals are countable, we can write 
them as Q = {q(0), g(1), q(2),...}, where g : N > Qis a bijection from N to Q. Now define a 
function g: Q > R by setting g(q(n)) := 2~” for each natural number n; thus g maps q(0) to 1, 
q(1) to 27!, ete. Since yy 27” is absolutely convergent, we see that Yea g(r) is also absolutely 
convergent. Now define the function f: R — R by 


f= >) 2). 


reQ:ir<x 


Since >>. <Q 8(r) is absolutely convergent, we know that f(x) is well-defined for every real number 
Xi 


(a) Show that /f is strictly monotone increasing. (Hint: you will need Proposition 5.4.14.) 

(b) Show that for every rational number r, f is discontinuous at r. (Hint: since r is rational, 
r = q(n) for some natural number n. Show that f(x) > f(r) +27” for all x > r.) 

(c) Show that for every irrational number x, f is continuous at x. (Hint: first demonstrate that the 
functions 


fete YS sh) 


reQ:ir<x,g(r)=2-" 


are continuous at x, and that | f(x) — fn(x)| < 27”.) 


9.9 Uniform Continuity 


We know that a continuous function on a closed interval [a, b] remains bounded (and 
in fact attains its maximum and minimum, by the maximum principle). However, 
if we replace the closed interval by an open interval, then continuous functions 
need not be bounded any more. An example is the function f: (0,2) — R defined 
by f(x) := 1/x. This function is continuous at every point in (0, 2) and is hence 
continuous at (0, 2), but is not bounded. Informally speaking, the problem here is 
that while the function is indeed continuous at every point in the open interval (0, 2), 
it becomes “‘less and less” continuous as one approaches the endpoint 0. 

Let us analyze this phenomenon further, using the “epsilon-delta” definition of 
continuity—Proposition 9.4.7c. We know that if f: X — R is continuous at a point 
xo, then for every ¢ > 0 there exists a 6 such that f(x) will be e-close to f(x) 
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whenever x € X is 5-close to xg. In other words, we can force f(x) to be ¢-close to 
J (xo) if we ensure that x is sufficiently close to x9. One way of thinking about this 
is that around every point x9 there is an “island of stability” (xo — 6, x9 + 6), where 
the function f(x) doesn’t stray by more than ¢ from f (x9). 


Example 9.9.1 Take the function f(x) := 1/x mentioned above at the point xp = 1. 
In order to ensure that f(x) is 0.1-close to f (xo), it suffices to take x to be 1/11- 
close to xo, since if x is 1/11-close to x9 then 10/11 < x < 12/11, and so 11/12 < 
f(x) < 11/10, and so f(x) is 0.1-close to f (xo). Thus the “5” one needs to make 
f (x) 0.1-close to f (xo) is about 1/11 or so, at the point x9 = 1. 

Now let us look instead at the point x9 = 0.1. The function f(x) = 1/x is still 
continuous here, but we shall see the continuity is much worse. In order to ensure 
that f(x) is 0.1-close to f(x), we need x to be 1/1010-close to xo. Indeed, if x is 
1/1010 close to xo, then 10/101 < x < 102/1010, and so 9.901 < f(x) < 10.1, s0 
f(x) is 0.1-close to f (xo). Thus one needs a much smaller “6” for the same value 
of €, i.e., f(x) is much more “unstable” near 0.1 than it is near 1, in the sense that 
there is a much smaller “island of stability” around 0.1 as there is around | (if one is 
interested in keeping f (x) 0.1-stable). 


On the other hand, there are other continuous functions which do not exhibit this 
behavior. Consider the function g: (0,2) — R defined by g(x) := 2x. Let us fix 
€ = 0.1 as before and investigate the island of stability around xo = 1. It is clear that 
if x is 0.05-close to xo, then g(x) is 0.1-close to g(xo); in this case we can take 6 
to be 0.05 at x9 = 1. And if we move x9 around, say if we set xo to 0.1 instead, the 
5 does not change—even when x is set to 0.1 instead of 1, we see that g(x) will 
stay 0.1-close to g(xo) whenever x is 0.05-close to x9. Indeed, the same 6 works for 
every xo. When this happens, we say that the function g is uniformly continuous. 
More precisely: 


Definition 9.9.2 [Uniform continuity] Let X be a subset of R, and let f: X > R 
be a function. We say that f is uniformly continuous if, for every € > 0, there exists 
ad > O such that f(x) and f (xo) are e-close whenever x, x9 € X are two points in 
X which are 6-close. 


Remark 9.9.3. This definition should be compared with the notion of continuity. 
From Proposition 9.4.7c, we know that a function f is continuous if for every e > 0, 
and every xo € X, there is ad > 0 such that f(x) and f(x) are e-close whenever 
x € X is 5-close to x9. The difference between uniform continuity and continuity 
is that in uniform continuity one can take a single 6 which works for all xo € X; 
for ordinary continuity, each x9 € X might use a different 5. Thus every uniformly 
continuous function is continuous, but not conversely. 


Example 9.9.4 (Informal) The function f: (0,2) — R defined by f(x) := 1/x is 
continuous on (0, 2), but not uniformly continuous, because the continuity (or more 
precisely, the dependence of 5 on ¢) becomes worse and worse as x — 0. (We will 
make this more precise in Example 9.9.10.) 
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Recall that the notions of adherent point and of continuous function had several 
equivalent formulations, both had “epsilon-delta” type formulations (involving the 
notion of e-closeness), and both had “sequential” formulations (involving the con- 
vergence of sequences); see Lemma 9.1.14 and Proposition 9.3.9. The concept of 
uniform continuity can similarly be phrased in a sequential formulation, this time 
using the concept of equivalent sequences (cf. Definition 5.2.6, but we now gener- 
alize to sequences of real numbers instead of rationals, and no longer require the 
sequences to be Cauchy): 


Definition 9.9.5 (Equivalent sequences) Let m be an integer, let (a,)7°,, and 
(bn)-2,, be two sequences of real numbers, and let ¢ > 0 be given. We say that 
(an)p-., is €-close to (by)? iff dy is e-close to b, for each n > m. We say that 
(Gn)pe-m iS eventually e-close to (by )r~,, iff there exists an N > m such that the 
sequences (a,)°° y and (b,)°°2 y are e-close. Two sequences (an) n= em and (b,)°2 
are equivalent iff for each ¢ > 0, the sequences (a,)°_,, and (b,)°°,,, are eventually 


€-close. 


n=m Aa=m 


Remark 9.9.6 One could debate whether ¢ should be assumed to be rational or real, 
but a minor modification of Proposition 6.1.4 shows that this does not make any 
difference to the above definitions. 


The notion of equivalence can be phrased more succinctly using our language of 
limits: 


Lemma 9.9.7 Let (a,)p°., and (b,)°°_, be sequences of real numbers (not necessarily 
bounded or convergent). Then (a,)e°., and (b,)°°., are equivalent if and only if 
Himy 0 (Qn — Dn) = 0. 


Proof See Exercise 9.9.1. 


Meanwhile, the notion of uniform continuity can be phrased using equivalent 
sequences: 


Proposition 9.9.8 Let X be a subset of R, and let f : X — R be a function. Then 
the following two statements are logically equivalent: 


(a) f is uniformly continuous on X. 
(b) Whenever (Xn)°°.9 and (Yn)P-.9 are two equivalent sequences consisting of ele- 
ments of X, the sequences CF (Xn) 29 and (f (yn))e29 are also equivalent. 


Proof See Exercise 9.9.2. 


Remark 9.9.9 The reader should compare this with Proposition 9.3.9. Proposition 
9.3.9 asserted that if f was continuous, then f maps convergent sequences to conver- 
gent sequences. In contrast, Proposition 9.9.8 asserts that if f is uniformly continuous, 
then f maps equivalent pairs of sequences to equivalent pairs of sequences. To see 
how the two Propositions are connected, observe from Lemma 9.9.7 that (x)?°9 
will converge to x,, if and only if the sequences (x,,)?°.9 and (x,)°9 are equivalent. 
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Example 9.9.10 Consider the function f: (0,2) > R defined by f(x) := 1/x 
considered earlier. From Lemma 9.9.7 we see that the sequence (1/n)7°, and 
(1/2n)°° , are equivalent sequences in (0, 2). However, the sequences (f(1/n))°° 
and (f(1/2n))°° , are not equivalent (why? Use Lemma 9.9.7 again). So by Propo- 
sition 9.9.8, f is not uniformly continuous. (These sequences start at 1 instead of 0, 
but the reader can easily see that this makes no difference to the above discussion.) 


Example 9.9.11 Consider the function f: R — R defined by f(x) := x. This 
is a continuous function on R, but it turns out not to be uniformly continuous; 
in some sense the continuity gets “worse and worse” as one approaches infin- 
ity. One way to quantify this is via Proposition 9.9.8. Consider the sequences 
(n)eo, and (n+ po aae By Lemma 9.9.7, these sequences are equivalent. But 
the sequences (f(n))°°, and (f(n + a) aa are not equivalent, since f(m + 1) = 
n?+2+ 4 = f(in)+2+ a does not become eventually 2-close to f(n). By 
Proposition 9.9.8 we can thus conclude that f is not uniformly continuous. 


Another property of uniformly continuous functions is that they map Cauchy 
sequences to Cauchy sequences. 


Proposition 9.9.12 Let X be a subset of R, and let f: X — R be a uniformly 
continuous function. Let (Xn )v° 9 be a Cauchy sequence consisting entirely of elements 
in X. Then (f (Xn))2.9 is also a Cauchy sequence. 


Proof See Exercise 9.9.3. 


Example 9.9.13 Once again, we demonstrate that the function f: (0,2) > R 
defined by f(x) := 1/x is not uniformly continuous. The sequence (1/n)P°, is a 
[oe 


Cauchy sequence in (0, 2), but the sequence (f(1/n))-°., is not a Cauchy sequence 
(why?). Thus by Proposition 9.9.12, f is not uniformly continuous. 


Corollary 9.9.14 Let X be a subset of R, let f : X — R be a uniformly continuous 
function, and let xo be an adherent point of X. Then the limit lim,_, »):xex f (x) exists 
(in particular, it is a real number). 


Proof See Exercise 9.9.4. 


We now show that a uniformly continuous function will map bounded sets to 
bounded sets. 


Proposition 9.9.15 Let X be a subset of R, and let f: X — R be a uniformly 
continuous function. Suppose that E is a bounded subset of X. Then f (E) is also 
bounded. 


Proof See Exercise 9.9.5. 


As we have just seen repeatedly, not all continuous functions are uniformly con- 
tinuous. However, if the domain of the function is a closed interval, then continuous 
functions are in fact uniformly continuous: 
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Theorem 9.9.16 Let a < b be real numbers, and let f : [a,b] > R be a function 
which is continuous on [a, b]. Then f is also uniformly continuous. 


Proof Suppose for sake of contradiction that f is not uniformly continuous. By 
Proposition 9.9.8, there must therefore exist two equivalent sequences (x,)°°.9 and 
(¥n)e2o in [a, b] such that the sequences (f (x,))P2.9 and (f (yn))°2.9 are not equiva- 
lent. In particular, we can find an e > O such that (f (x,))°9 and (f (yn) P29 are not 
eventually e-close. 

Fix this value of ¢, and let E be the set 


E:={neéN: f(x,) and f(y,) are not e-close}. 
We must have E infinite, since if E were finite then (f(x,))°9 and (f(yn))79 
would be eventually ¢-close (why?). By Proposition 8.1.5, E is countable; in fact 
from the proof of that proposition we see that we can find an infinite sequence 
no <ny<no<... 
consisting entirely of elements in E. In particular, we have 


If (%n;) — fOn,)| > € forall j EN. (9.3) 


On the other hand, the sequence (Xn, )FZ0 is a sequence in [a, b], and so by the 


Heine—Borel theorem (Theorem 9.1.24) there must be a subsequence (Xn j, Yr9 Which 
converges to some limit L in [a, b]. In particular, f is continuous at L, and so by 
Proposition 9.4.7, 

dim fin) = SD). (9.4) 


Note that (x, )g29 is a subsequence of (x,)°°4, and (yn, )¢29 is a subsequence of 
Tk 7K n= ip 7 k= 
(¥n)P2, by Lemma 6.6.4. On the other hand, from Lemma 9.9.7 we have 


lim (x, — yn) = 0. 
n—->oo 
By Proposition 6.6.5, we thus have 
fim Gn, — Ynj,) =0. 
Since x,,, converges to L as k — oo, we thus have by limit laws 
Past Yay = E 
and hence by continuity of f at L 


dim f0n,) = SD): 
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Subtracting this from (9.4) using limit laws, we obtain 
dim (F Gn.) — FOn,)) = 0. 


But this contradicts (9.3) (why?). From this contradiction we conclude that f is in 
fact uniformly continuous. 


Remark 9.9.17 One should compare Lemma 9.6.3, Proposition 9.9.15, and Theorem 
9.9.16 with each other. Note in particular that the lemma follows from combining 
the proposition with the theorem. 


— Exercises — 
Exercise 9.9.1 Prove Lemma 9.9.7. 


Exercise 9.9.2. Prove Proposition 9.9.8. (Hint: you should avoid Lemma 9.9.7, and instead go back 
to the definition of equivalent sequences in Definition 9.9.5.) 


Exercise 9.9.3 Prove Proposition 9.9.12. (Hint: use Definition 9.9.2 directly.) 


Exercise 9.9.4. Use Proposition 9.9.12 to prove Corollary 9.9.14. Use this corollary to give an 
alternate demonstration of the results in Example 9.9.10. 


Exercise 9.9.5 Prove Proposition 9.9.15. (Hint: mimic the proof of Lemma 9.6.3. At some point 
you will need either Proposition 9.9.12 or Corollary 9.9.14.) 


Exercise 9.9.6 Let X, Y, Z be subsets of R. Let f: X — Y be a function which is uniformly 
continuous on X, and let g: Y — Z bea function which is uniformly continuous on Y. Show that 
the function g o f: X — Z is uniformly continuous on X. 


9.10 Limits at Infinity 


Until now, we have discussed what it means for a function f: X — R to havea limit 
as xX —> Xg as long as xo is a real number. We now briefly discuss what it would mean 
to take limits when xo is equal to +00 or —oo. (This is part of a more general theory 
of continuous functions on a topological space; see Sect. 11.12.) 

First, we need a notion of what it means for +00 or —co to be adherent to a set. 


Definition 9.10.1 (Infinite adherent points) Let X be a subset of R. We say that +-oo 
is adherent to X iff for every M € R there exists an x € X such that x > M; we 
say that —oo is adherent to X iff for every M € R there exists an x € X such that 
x<M. 


In other words, +00 is adherent to X iff X has no upper bound, or equivalently 
iff sup(X) = ++oo. Similarly —oo is adherent to X iff X has no lower bound, or iff 
inf(X) = —oo. Thus a set is bounded if and only if +-oo and —oo are not adherent 
points. 
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Remark 9.10.2 This definition may seem rather different from Definition 9.1.8, but 
can be unified using the topological structure of the extended real line R*, which we 
will not discuss here. 


Definition 9.10.3 (Limits at infinity) Let X be a subset of R with +00 as an adherent 
point, and let f: X — Rbea function. We say that f (x) converges to Las x — +00 
in X, and write limy-.+.0:xex f(x) = L, iff for every ¢ > 0 there exists an M such 
that f is e-close to L on X N (M, +00) (1.e., | f(x) — L| < ¢ for all x € X such that 
x > M). Similarly we say that f(x) converges to L as x —> —ow iff for every ¢ > 0 
there exists an M such that f is e-close to L on X M (—oo, M). 


Example 9.10.4 Let f: (0,00) > R be the function f(x) := 1/x. Then we have 
lim,—.-+00:xe(0,00) 1/x = 0. (Can you see why, from the definition?) 


One can do many of the same things with these limits at infinity as we have been 
doing with limits at other points xo; for instance, it turns out that all of the limit laws 
continue to hold. However, as we will not be using these limits much in this text, 
we will not devote much attention to these matters. We will note though that this 
definition is consistent with the notion of a limit lim, oo d, of a sequence (Exercise 
9.10.1). 


— Exercises — 


Exercise 9.10.1 Let (an) 9 be a sequence of real numbers, then a, can also be thought of as a 
function from N to R, which takes each natural number n to a real number a,. Show that 


lim a, = lim a, 
n—+o00;neN noo 
where the left-hand limit is defined by Definition 9.10.3 and the right-hand limit is defined by 
Definition 6.1.8. More precisely, show that if one of the above two limits exists then so does the 
other, and then they both have the same value. Thus the two notions of limit here are compatible. 


Chapter 10 M®) 
Differentiation of Functions ectics 


10.1 Basic Definitions 


We can now begin the rigorous treatment of calculus in earnest, starting with the 
notion of a derivative. We can now define derivatives analytically, using limits, in 
contrast to the geometric definition of derivatives, which uses tangents. The advantage 
of working analytically is that (a) we do not need to know the axioms of geometry, 
and (b) these definitions can be modified to handle functions of several variables, or 
functions whose values are vectors instead of scalar. Furthermore, one’s geometric 
intuition becomes difficult to rely on once one has more than three dimensions in 
play. (Conversely, one can use one’s experience in analytic rigor to extend one’s 
geometric intuition to such abstract settings; as mentioned earlier, the two viewpoints 
complement rather than oppose each other.) 


Definition 10.1.1 (Differentiability at a point) Let X be a subset of R, and let x9 € X 
be an element of X which is also a limit point of X. Let f: X — R be a function. 


If the limit 
ij f(x) — f Mo) 
im — 


xX x9;xEX\{xo} x — Xo 


converges to some real number L, then we say that f is differentiable at x) on X 
with derivative L and write f'(xo) := L. If the limit does not exist, or if xo is not an 
element of X or not a limit point of X, we leave f’(xo) undefined and say that f is 
not differentiable at xp on X. 


Remark 10.1.2 Note that we need xo to be a limit point in order for xo to be adherent 
to X\{xo}, otherwise the limit 


f(x) — fo) 


xX x9;xEX\{xo} X= XG 
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would automatically be undefined. In particular, we do not define the derivative of 
a function at an isolated point; for instance, if one restricts the function f: R > R 
defined by f(x) := x’ to the domain X := [1, 2] U {3}, then the restriction of the 
function ceases to be differentiable at 3. (See however Exercise 10.1.1 below.) In 
practice, the domain X will almost always be an interval, and so by Lemma 9.1.21 
all elements xo of X will automatically be limit points and we will not have to care 
much about these issues. 


Example 10.1.3 Let f : R — R be the function f (x) := x7, and let xo be any real 
number. To see whether f is differentiable at x9 on R, we compute the limit 


fF) = Go) . ae 
lim lim 5 
x—>x0;xER\{xo} x — Xo X>x9;XER\{xXo} X — XO 


We can factor the numerator as (x* — x) = (x — x9) (x + x0). Since x € R\{xo}, 
we may legitimately cancel the factors of x — xo and write the above limit as 


lim x + xo 
x x0; xER\{xo} 


which by limit laws is equal to 2x9. Thus the function f(x) is differentiable at x 
and its derivative there is 2x0. 


Remark 10.1.4 This point is trivial, but it is worth mentioning: if f: X —> R is 
differentiable at x9, and g: X — R is equal to f (ie., g(x) = f(x) for all x € 
X), then g is also differentiable at xo and g’(xo) = f’(xo) (why?). However, if two 
functions f and g merely have the same value at xo, 1.e., g(xo) = f (Xo), this does 
not imply that g’(xo) = f’(xo). (Can you see a counterexample?) Thus there is a 
big difference between two functions being equal on their whole domain and merely 
being equal at one point. 


Remark 10.1.5 One sometimes writes af instead of f’. This notation is of course 


very familiar and convenient, but one has to be a little careful, because it is only safe 
to use as long as x is the only variable used to represent the input for f; otherwise 
one can get into all sorts of trouble. For instance, the function f: R — R defined by 
f (x) := x? has derivative af = 2x, but the function g: R > R defined by g(y) := 
y* would seem to have derivative 7 = Oif y and x are independent variables, despite 
the fact that g and f are exactly the same function. Because of this possible source 
of confusion, we will refrain from using the notation ae whenever it could possibly 
lead to confusion. (This confusion becomes even worse in the calculus of several 
variables, and the standard notation of ae can lead to some serious ambiguities. 
There are ways to resolve these ambiguities, most notably by introducing the notion 
of differentiation along vector fields, but this is beyond the scope of this text.) 


Example 10.1.6 Let f: R — R be the function f(x) := |x|, and let x9 = 0. To see 
whether f is differentiable at 0 on R, we compute the limit 
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fO-fO_ bel 
—_—_— in — 


x>0:xeR\(0} x —O x—>0:xeR\{O} x” 


Now we take left limits and right limits. The right limit is 


; |x| : x ; 
lim —= lim -= lim 1=1, 
x>0;xE(0,00) X x>0;xE(0,00) X x—>0;xE(0,00) 
while the left limit is 
: |x| . —Xx . 
lim es lim = lim -l=-l, 
x—>0;xE(—00,0) X x—>0;x€(—00,0) X x—0;x€(—00,0) 


and these limits do not match. Thus lim,_,9-,eR\,0} el does not exist, and f is not 
differentiable at 0 on R. However, if one restricts f to [0, oo), then the restricted 
function f'|j0,00) is differentiable at 0 on [0, oo), with derivative 1: 


f(x) — FO) Ix] 
lim OS lim —=1. 
x—0;x€[0,00)\ {0} x-—0O x—>0;xE(0,00) X 


Similarly, when one restricts f to (—oo, 0], the restricted function f|(~co,0] is dif- 
ferentiable at 0 on (—oo, 0], with derivative —1. Thus even when a function is not 
differentiable, it is sometimes possible to restore the differentiability by restricting 
the domain of the function. 


If a function is differentiable at xo, then it is approximately linear near xo: 


Proposition 10.1.7 (Newton’s approximation) Let X be a subset of R, let x9 € X 
be a limit point of X, let f : X — R be a function, and let L be a real number. Then 
the following statements are logically equivalent: 


(a) f is differentiable at xy on X with derivative L. 
(b) Foreverye > 0, there existsad > O such that f (x) is €|x — xo|-close to f (xo) + 
L(x — xo) whenever x € X is 8-close to xo, i.e., we have 


lf (x) — (fo) + L(& — x0))| < Elx — x0 
whenever x € X and |x — xo| < 6. 


Remark 10.1.8 Newton’s approximation is of course named after the great scientist 
and mathematician Isaac Newton (1642-1727), one of the founders of differential 
and integral calculus. 


Proof See Exercise 10.1.2. 


Remark 10.1.9 We can phrase Proposition 10.1.7 in a more informal way: if f is 
differentiable at xo, then one has the approximation f(x) © f (xo) + f’ (xo) (x — Xo), 
and conversely. 
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As the example of the function f: R — R defined by f(x) := |x| shows, a 
function can be continuous at a point without being differentiable at that point. 
However, the converse is true: 


Proposition 10.1.10 (Differentiability implies continuity) Let X be a subset of R, 
letxg € X bealimit point of X, and let f : X — Rbea function. If f is differentiable 
at xo, then f is also continuous at xo. 


Proof See Exercise 10.1.3. 


Definition 10.1.11 (Differentiability on a domain) Let X be a subset of R, and let 
f: X — R be a function. We say that f is differentiable on X if, for every limit 
point xo € X, the function f is differentiable at x9 on X. 


From Proposition 10.1.10 and the above definition, as well as the fact that a 
function is automatically continuous at every isolated point of its domain, we have 
an immediate corollary: 


Corollary 10.1.12 Let X be a subset of R, and let f : X — R be a function which 
is differentiable on X. Then f is also continuous on X. 


Now we state the basic properties of derivatives which you are all familiar with. 


Theorem 10.1.13 [Differential calculus] Let X be a subset of R, let x9 € X bea 
limit point of X, and let f: X — Rand g: X > R be functions. 


(a) If f is a constant function, i.e., there exists a real number c such that f (x) = c 
for all x € X, then f is differentiable at xo and f'(xo) = 0. 


(b) If f is the identity function, i.e., f(x) = x forall x € X, then f is differentiable 
at xy and f'(xo) = 1. 

(c) (Sum rule) If f and g are differentiable at xo, then f + g is also differentiable 
at xo, and (f + g)'(xo) = f’(xo) + g’ (x0). 

(d) (Product rule) If f and g are differentiable at xo, then f g is also differentiable 
at xo, and (fg)'(xo) = f'(x%0)g (xo) + fo)’ (Xo). 

(e) If f is differentiable at x9 and c is a real number, then cf is also differentiable 
at xq, and (cf )' (xo) = cf’ (xo). 

(f) (Difference rule) If f and g are differentiable at xo, then f — g is also differen- 
tiable at xo, and (f — g)' (xo) = f' (xo) — g' (Xo). 

(g) If g is differentiable at xo, and g is non-zero on X (i.e., g(x) # O forall x € X), 
then 1/g is also differentiable at xo, and (ZY @o) =— ane 

(h) (Quotient rule) If f and g are differentiable at xo, and g is non-zero on X, then 


J/g is also differentiable at xo, and 


f\ . _ f'@0)g Qo) — f @o)g’ (x0) 
— } (x0) = ; ; 
g g (Xo) 
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Remark 10.1.14 The product rule is also known as the Leibniz rule, after Gottfried 
Leibniz (1646-1716), who was the other founder of differential and integral calculus 
besides Newton. 


Proof See Exercise 10.1.4. 


As you are well aware, the above rules allow one to compute many derivatives 
easily. For instance, if f: R\{1} — R is the aa f(@) := i , then it is easy 
to use the above rules to show that f’(xo) = Giz x for all xp € R\{1}. (Why? Note 
that every point xo in R\{1} is a limit point of R\I 1}. ) 

Another fundamental property of differentiable functions is the following: 


Theorem 10.1.15 [Chain rule] Let X, Y be subsets of R, let x9 € X be a limit 
point of X, and let yy) € Y be a limit point of Y. Let f: X — Y be a function 
such that f (xo) = yo, and such that f is differentiable at xo. Suppose that g: Y > 
R is a function which is differentiable at yo. Then the function go f: X > Ris 
differentiable at xo, and 


(g 0 f)'(Xo) = 8'(y0) fo). 
Proof See Exercise 10.1.7. 


Example 10.1.16 ae R\{1} > R is the function f(x) := i ,andg: R > Ris 
the function g(y) := y, then go f(x) = CS 2), and the shan ‘ie gives 


—2 1 
(@° No) =2(2= ae 


Remark 10.1.17 Tf one writes y for f(x), and z for g(y), then the chain rule can be 
written in the more visually appealing manner = = = 5 a However, this notation can 
be misleading (for instance it blurs the distinction between dependent variable and 
independent variable, especially for y) and leads one to believe that the quantities 
dz, dy, dx can be manipulated like real numbers. However, these quantities are 
not real numbers (in fact, we have not assigned any meaning to them at all), and 
treating them as such can lead to problems in the future. For instance, if f depends 
on x; and x2, which depend on f, then chain rule for several variables asserts that 
af — = oe ae. + i ie , but this rule might seem suspect if one treated d/, dr, etc. as 
real numbers. It is possible to think of dy, dx, etc. as “infinitesimal real numbers” 
if one knows what one is doing, but for those just starting out in analysis, I would 
not recommend this approach, especially if one wishes to work rigorously. (There 
is a way to make all of this rigorous, even for the calculus of several variables, but 
it requires the notion of a tangent vector and the derivative map, both of which are 


beyond the scope of this text.) 
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— Exercises — 


Exercise 10.1.1 Suppose that X is a subset of R, xo is a limit point of X, and f: X > Risa 
function which is differentiable at x9. Let Y C X be such that xo € Y, and xo is also a limit point 
of Y. Prove that the restricted function fly : Y > R is also differentiable at x9 and has the same 
derivative as f at x9. Explain why this does not contradict the discussion in Remark 10.1.2. 


Exercise 10.1.2 Prove Proposition 10.1.7. (Hint: the cases x = xo and x 4 xo have to be treated 
separately.) 


Exercise 10.1.3 Prove Proposition 10.1.10. (Hint: either use the limit laws (Proposition 9.3.14) or 
use Proposition 10.1.7.) 


Exercise 10.1.4. Prove Theorem 10.1.13. (Hint: use the limit laws in Proposition 9.3.14. Use earlier 
parts of this theorem to prove the latter. For the product rule, use the identity 


f (x)g(x) — f (x0) go) 
= f(x)g(x) — f(x)g(xo) + f(x) g (x0) — f (x0) g (Xo) 
= f(x)(g(x) — g(xo)) + (Ff @) — fo) go); 


this trick of adding and subtracting an intermediate term is sometimes known as the “middle-man 
trick” and is very useful in analysis.) 


Exercise 10.1.5 Let n be a natural number, and let f: R — R be the function f(x) := x". Show 
that f is differentiable on R and f’(x) = nx"~! for all x € R, adopting the convention that nx”—! 
is 0 when n = 0. (Hint: use Theorem 10.1.13 and induction.) 


Exercise 10.1.6 Let n be a negative integer, and let f: R\{0} > R be the function f(x) := x”. 
Show that f is differentiable on R\{O}, and that f’(x) = nx"! for all x € R\{0}. (Hint: use 
Theorem 10.1.13 and Exercise 10.1.5.) 


Exercise 10.1.7 Prove Theorem 10.1.15. (Hint: one way to do this is via Newton’s approxima- 
tion, Proposition 10.1.7. Another way is to use Proposition 9.3.9 and Proposition 10.1.10 to con- 
vert this problem into one involving limits of sequences; however with the latter strategy one 
has to treat the case f’(xq) = 0 separately, as some division-by-zero subtleties can occur in that 
case.) 


10.2 Local Maxima, Local Minima, and Derivatives 


As you learnt in your basic calculus courses, one very common application of using 
derivatives is to locate maxima and minima. We now present this material again, but 
this time in a rigorous manner. 

The notion of a function f: X — R attaining a maximum or minimum at a point 
xq € X was defined in Definition 9.6.5. We now localize this definition: 


Definition 10.2.1 (Local maxima and minima) Let X be a subset of R, let f: X > 
R be a function, and let x9 € X. We say that f attains a local maximum at xo iff there 
exists a 6 > O such that the restriction f|xA(xp—s,x9+8) Of f to XN (xo — 5, xo + 5) 
attains a maximum at x9. We say that f attains a local minimum at xo iff there exists 
a6 > 0 such that the restriction f|xAcy—s,x9 +5) Of f to XN (xo — 6, x9 + 4) attains 
a minimum at Xo. 
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Remark 10.2.2 If f attains a maximum at xo, we sometimes say that f attains a 
global maximum at xo, in order to distinguish it from the local maxima defined here. 
Note that if f attains a global maximum at xo, then it certainly also attains a local 
maximum at this xo, and similarly for minima. 


Example 10.2.3 Let f : R > Rdenote the function f(x) := x? — x*. This function 
does not attain a global minimum at 0, since for example f(2) = —12 <0= f(0), 
however it does attain a local minimum, for if we choose 6 := | and restrict f to 
the interval (—1, 1), then for all x € (—1, 1) we have x* < x? and thus f@wa= 
x? —xt>0= f(0), and so f|(-1,1) has a (global) minimum at 0. 

Example 10.2.4 Let f : Z— R be the function f(x) = x, defined on the integers 
only. Then f has no global maximum or global minimum (why’?), but attains both a 
local maximum and local minimum at every integer n (why?). 


Remark 10.2.5 If f: X — Rattains alocal maximum ata point x9 in X¥,and Y C X 
is a subset of X which contains xo, then the restriction f|y : Y > R also attains a 
local maximum at xo (why?). Similarly for minima. 


The connection between local maxima, minima, and derivatives is the following. 


Proposition 10.2.6 (Local extrema are stationary) Let a < b be real numbers, and 
let f : (a,b) > Rbea function. If xo € (a, b), f is differentiable at xo, and f attains 
either a local maximum or a local minimum at xo, then f'(xo) = 0. 


Proof See Exercise 10.2.1. 


Note that f must be differentiable for this proposition to work; see Exercise 10.2.2. 
Also, this proposition does not work if the open interval (a, b) is replaced by a closed 
interval [a, b]. For instance, the function f: [1,2] — R defined by f(x) := x hasa 
local maximum at x9 = 2 and a local minimum xo = | (in fact, these local extrema 
are global extrema), but at both points the derivative is f’(xo) = 1, not f’(xo) = 0. 
Thus the endpoints of an interval can be local maxima or minima even if the derivative 
is not zero there. Finally, the converse of this proposition is false (Exercise 10.2.3). 

By combining Proposition 10.2.6 with the maximum principle, one can obtain 


Theorem 10.2.7 [Rolle’s theorem] Leta < b be real numbers, and let g: {a, b] > 
R be acontinuous function which is differentiable on (a, b). Suppose also that g(a) = 
g(b). Then there exists an x € (a, b) such that g'(x) = 0. 


Proof See Exercise 10.2.4. 


Remark 10.2.8 Note that we only assume f is differentiable on the open interval 
(a, b), though of course the theorem also holds if we assume f/f is differentiable on 
the closed interval [a, b], since this is larger than (a, b). 


Rolle’s theorem has an important corollary. 
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Corollary 10.2.9 (Mean-value theorem) Leta < bbe realnumbers, and let f : [a, b] 
— R be a function which is continuous on [a, b] and differentiable on (a, b). Then 
there exists an x € (a,b) such that f'(x) = LO)“ F@) 


b—a 


Proof See Exercise 10.2.5. 


— Exercises — 
Exercise 10.2.1 Prove Proposition 10.2.6. 


Exercise 10.2.2 Give an example of a function f: (—1, 1) ~ R which is continuous and attains 
a global maximum at 0, but which is not differentiable at 0. Explain why this does not contradict 
Proposition 10.2.6. 


Exercise 10.2.3 Give an example of a function f: (—1, 1) — R which is differentiable, and whose 
derivative equals 0 at 0, but such that 0 is neither a local minimum nor a local maximum. Explain 
why this does not contradict Proposition 10.2.6. 


Exercise 10.2.4. Prove Theorem 10.2.7. (Hint: use the maximum principle, Proposition 9.6.7, fol- 
lowed by Proposition 10.2.6. Note that the maximum principle does not tell you whether the max- 
imum or minimum is in the open interval (a, b) or is one of the boundary points a, b, so you have 
to divide into cases and use the hypothesis g(a) = g(b) somehow.) 


Exercise 10.2.5 Use Theorem 10.2.7 to prove Corollary 10.2.9. (Hint: consider a function of the 
form f(x) — cx for some carefully chosen real number c.) 


Exercise 10.2.6 Let M > 0, and let f: [a,b] — R be a function which is continuous on [a, b] 
and differentiable on (a, b), and such that | f’(x)| < M for all x € (a, b) (i.e., the derivative of f 
is bounded). Show that for any x, y € [a, b] we have the inequality | f(x) — f(y)| < M|x — yl]. 
(Hint: apply the mean-value theorem (Corollary 10.2.9) to a suitable restriction of f.) Functions 
which obey the bound | f(x) — f(y)| < M|x — y| are known as Lipschitz continuous functions with 
Lipschitz constant M; thus this exercise shows that functions with bounded derivative are Lipschitz 
continuous. 


Exercise 10.2.7 Let f: R > R be a differentiable function such that f’ is bounded. Show that f 
is uniformly continuous. (Hint: use the preceding exercise.) 


10.3. Monotone Functions and Derivatives 


In your elementary calculus courses, you may have come across the assertion that a 
positive derivative meant an increasing function, and a negative derivative meant a 
decreasing function. This statement is not completely accurate, but it is pretty close; 
we now give the precise version of these statements below. 


Proposition 10.3.1 Let X be a subset of R, let x9 € X be a limit point of X, and let 
ff: X > R be a function. If f is monotone increasing and f is differentiable at 
Xo, then f'(xo) = 0. If f is monotone decreasing and f is differentiable at xo, then 
f' (xo) < 0. 


Proof See Exercise 10.3.1. 
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Remark 10.3.2 We have to assume that f is differentiable at x9; there exist monotone 
functions which are not always differentiable (see Exercise 10.3.2), and of course if f 
is not differentiable at x9 we cannot possibly conclude that f’(xq) => Oor f’(x9) < 0. 


One might naively guess that if f were strictly monotone increasing, and f was 
differentiable at x9, then the derivative f’(xo) would be strictly positive instead of 
merely non-negative. Unfortunately, this is not always the case (Exercise 10.3.3). 

On the other hand, we do have a converse result: if function has strictly positive 
derivative, then it must be strictly monotone increasing: 


Proposition 10.3.3. Leta < b, and let f : [a,b] > R be a differentiable function. If 
S'(x) > Ofor all x € [a, b], then f is strictly monotone increasing. If f'(x) < 0 for 
allx € [a, b], then f is strictly monotone decreasing. If f'(x) = 0 for all x € [a, b], 
then f is a constant function. 


Proof See Exercise 10.3.4. 


— Exercises — 
Exercise 10.3.1 Prove Proposition 10.3.1. 


Exercise 10.3.2 Give anexample of afunction f: (—1, 1) — R whichis continuous and monotone 
increasing, but which is not differentiable at 0. Explain why this does not contradict Proposition 
1.3.1. 


Exercise 10.3.3 Give an example of a function f: R — R which is strictly monotone increasing 
and differentiable, but whose derivative at 0 is zero. Explain why this does not contradict Proposition 
10.3.1 or Proposition 10.3.3. (Hint: look at Exercise 10.2.3.) 


Exercise 10.3.4 Prove Proposition 10.3.3. (Hint: you do not have integrals or the fundamental 
theorem of calculus yet, so these tools cannot be used. However, one can proceed via the mean- 
value theorem, Corollary 10.2.9.) 


Exercise 10.3.5 Give an example of a subset X C R anda function f: X — R which is differen- 
tiable on X, is such that f’(x) > 0 for all x € X, but f is not strictly monotone increasing. (Hint: 
the conditions here are subtly different from those in Proposition 10.3.3. What is the difference, 
and how can one exploit that difference to obtain the example?) 


10.4 Inverse Functions and Derivatives 


We now ask the following question: if we know that afunction f: X — Y is differen- 
tiable, and it has an inverse f~': Y —> X, what can we say about the differentiability 
of f—'? This will be useful for many applications, for instance if we want to differ- 
entiate the function f(x) := x!/". 

We begin with a preliminary result. 
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Lemma 10.4.1 Let X, Y be subsets of R, and let f : X — Y be an invertible func- 
tion, with inverse f~'| : ¥Y > X. Suppose that xy € X and yo € Y are limit points of 
X,Y, respectively, such that yy = f (x9) (which also implies that xy = f—'(yo)). If 
f is differentiable at xo, and f—' is differentiable at yo, then 


-ly _ 1 
Ca Sale ro 


Proof From the chain rule (Theorem 10.1.15) we have 


CF" 0 FY Go) = FY Oo) F’ Ga). 


But f~! o f is the identity function on X, and hence by Theorem 10.1.13(b) (f~! 0 
Ff)’ (xo) = 1. The claim follows. 


As a particular corollary of Lemma 10.4.1, we see that if f is differentiable at x9 
with f’(xo) = 0, then f —! cannot be differentiable at yo = f (Xo), since 1/f’ (xo) is 
undefined in that case. Thus for instance, the function g: [0, 00) — [0, oo) defined 
by g(y) := y! cannot be differentiable at 0, since this function is the inverse g = 
f—' of the function f : [0, 00) — [0, 00) defined by f(x) := x3, and this function 
has a derivative of 0 at f~'(0) = 0. 

If one writes y = f(x), so that x = f—'(y), then one can write the conclusion 
of Lemma 10.4.1 in the more appealing form dx/dy = 1/(dy/dx). However, as 
mentioned before, this way of writing things, while very convenient and easy to 
remember, can be misleading and cause errors if applied too carelessly (especially 
when one begins to work in the calculus of several variables). 

Lemma 10.4.1 seems to answer the question of how to differentiate the inverse 
of a function; however it has one significant drawback: the lemma only works if one 
assumes a priori that f—! is differentiable. Thus, if one does not already know that 
f—' is differentiable, one cannot use Lemma 10.4.1 to compute the derivative of 
= 

However, the following improved version of Lemma 10.4.1 will compensate for 
this fact, by relaxing the requirement on f—! from differentiability to continuity. 


Theorem 10.4.2 [Inverse function theorem] Let X,Y be subsets of R, and let 
f:: X > Y be an invertible function, with inverse f~!:Y — X. Suppose that 
xo € X and yo € Y are limit points of X,Y, respectively, such that f (xo) = yo. 
If f is differentiable at xo, f—! is continuous at yo, and f'(xo) 4 0, then f—! is 
differentiable at yo and 
1 

(f-'Y Oo) = =: 
f'(@o) 
Proof We have to show that 


F'O)= F700) _ 1 
y> yo: ¥EY\{yo} y— Yo 7a) 
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By Proposition 9.3.9, it suffices to show that 


im LOM F700) I 
1m = 
AO Yn — Yo Sf’ (xo) 


for any sequence (y,)°° , of elements in Y\{yo} which converge to yo. 

To prove this, we set x, am ft - 1y,). Then (x; )po 1 is a sequence of elements in 
X\{xo}. (Why? Note that f~! is a bijection). Since f—! is continuous by assumption, 
we know that x, = f~!(y,) converges to f~!(yo) = xo as n — oo. Thus, since f 
is differentiable at x9, we have (by Proposition 9.3.9 again) 


tim £20 = FOO) _ prey 


n->oo Xn — XO 


But since x, 4 Xo and f is a bijection, the fraction Ee) fo is non-zero. Also, by 
hypothesis f’(xo) is non-zero. So by limit laws 


Xn — XQ 1 


A Ff Qn) - fo) Ft’ (xo) : 


But since x, = f~'(y,) and x9 = f~!(yo), we thus have 


ie f"'On)-—f"O0) 1 
n= 00 Yn — Yo ~ f"(x0) 


as desired. 


We give some applications of the inverse function theorem in the exercises below. 


— Exercises — 


Exercise 10.4.1 Let n > 1 be a natural number, and let g: (0,00) — (0, 00) be the function 
g(x) = xl”, 
(a) Show that g is continuous on (0, oo). (Hint: use ae seca 9. a 11.) 


(b) Show that g is differentiable on (0, 00), and that g’(x) = Lyn —! for all x € (0, 00). (Hint: 
use the inverse function theorem and (a).) 


Exercise 10.4.2 Let q be a rational number, and let f: (0, 00) — R be the function f(x) = x4. 


(a) Show that f is differentiable on (0, 00) and that f’(x) = gxt!, (Hint: use Exercise 10.4.1 
and the laws of differential calculus i in Theorem 10.1.13 and Theorem 10.1.15.) 

(b) Show that lim,_,1-.e0, oo)\{1} * a eal = = q for every rational number q. (Hint: use part (a) and 
Definition 10.1.1. An alternate route is to apply L’ H6pital’s rule from the next section.) 


Exercise 10.4.3 Let a be a real number, and let f: (0,00) — R be the function f(x) = x. 


(a) Show that limy— 1:+€(0,00)\{4} La fo = a. (Hint: use Exercise 10.4.2 and the comparison 
principle; you may need to consider right and left limits separately. Proposition 5.4.14 may 
also be helpful.) 

(b) Show that f is differentiable on (0, 00) and that f’(x) = ax?—!, (Hint: use (a), exponent laws 
(Proposition 6.7.3), and Definition 10.1.1.) 
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10.5 L’H6pital’s Rule 


Finally, we present a version of a rule you are all familiar with. 


Proposition 10.5.1 (L’H6pital’s rule I) Let X be a subset of R, let f: X >R 
and g: X — R be functions, and let xo € X be a limit point of X. Suppose that 
Ff (Xo) = g(xo) = 0, that f and g are both differentiable at xo, but g' (xo) 4 0. Then 
there exists ad > 0 such that g(x) 4 O forall x € (XO (xp — 6, x9 + 8))\ {x0}, and 


f@) _ f’@o) 


m = : 
X> X93 XE(XN(X9—45,x9+8))\{Xo} g(x) g' (xo) 


Proof See Exercise 10.5.1. 


The presence of the 6 here may seem somewhat strange, but is needed because 
g(x) might vanish at some points other than xo, which would imply that quotient 
f w is not necessarily defined at all points in X\{xo}. 

A more sophisticated version of L’ H6pital’s rule is the following. 


Proposition 10.5.2 (L’H6pital’s rule I) Leta < b be realnumbers, and let f : [a, b] 
— Rand g: [a,b] — R be functions which are continuous on [a, b| and differ- 
entiable on (a, b). Suppose that f(a) = g(a) = 0, that g’ is non-zero on (a, b] 
(i.e. g(x) # O forall x € (a, b)), and lim, a-ye(a,b) oe exists and equals L. Then 


~ exists and equals L. 


g(x) £0 for all x € (a,b, and limy+a:xe(a,o) FO 


Remark 10.5.3 This proposition only considers limits to the right of a, but one can 
easily state and prove a similar proposition for limits to the left of a, or around both 
sides of a. Speaking very informally, the proposition states that 


tO 2 FO) 
im = lim ; 
xa g(x) xa g(x) 


though one has to ensure all of the conditions of the proposition hold (in particular, 
that f(a) = g(a) = 0, and that the right-hand limit exists), before one can apply 
L’H6pital’s rule. 


Proof (Optional) We first show that g(x) 4 0 for all x € (a, b]. Suppose for sake of 
contradiction that g(x) = 0 for some x € (a, b]. But since g(a) is also zero, we can 
apply Rolle’s theorem to obtain g’(y) = 0 for some a < y < x, but this contradicts 
the hypothesis that g’ is non-zero on [a, b]. 

Now we show that lima: xe(a,] na = L. By Proposition 9.3.9, it will suffice 


to show that 
lim S&~) _ 7 


noo g(Xn) > 


for any sequence (x,,)°°., taking values in (a, b] which converges to a. 
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Consider a single x,, and consider the function h,,: [a, x,] — R defined by 


Ay (x) = f(x)8 On) — 80) f An). 


Observe that , is continuous on [a, x, ] and equals 0 at both a and x, and is differen- 
tiable on (a, x,) with derivative h/ (x) = f'(x)g(xn) — g’ (x) f (Xn). (Note that f (x,) 
and g(x,) are constants with respect to x.) By Rolle’s theorem (Theorem 10.2.7), we 
can thus find y, € (a, x,) such that h’ (y,) = 0, which implies that 


f (Xn) = f'On) 
(Xn) 8’ (yn) ; 


Since y, € (a,x,) for all n, and x, converges to a as n — ov, we see from the 


squeeze test (Corollary 6.4.14) that y, also converges to a as n — oo. Thus Lie 
converges to L, and thus oa also converges to L, as desired. 


n 


— Exercises — 


Exercise 10.5.1 Prove Proposition 10.5.1. (Hint: to show that g(x) 4 0 near xo, you may wish to 
use Newton’s approximation (Proposition 10.1.7). For the rest of the proposition, use the limit laws, 
Proposition 9.3.14.) 


Exercise 10.5.2. Explain why Example 1.2.12 does not contradict either of the propositions in this 
section. 


Chapter 11 ®) 
The Riemann Integral coe is 


In the previous chapter we reviewed differentiation—one of the two pillars of single 
variable calculus. The other pillar is, of course, integration, which is the focus of the 
current chapter. More precisely, we will turn to the definite integral, the integral of a 
function on a fixed interval, as opposed to the indefinite integral, otherwise known 
as the antiderivative. These two are of course linked by the Fundamental theorem of 
calculus, of which more will be said later. 

For us, the study of the definite integral will start with an interval J which could 
be open, closed, or half-open, anda function f: J — R, and will lead us to a number 
, , J; we can write this integral as if , J (x) dx (of course, we could replace x by any 
other dummy variable), or if J has endpoints a and b, we shall also write this integral 
as [” f or [? f(x) dx. 

To actually define this integral f , f is somewhat delicate (especially if one does 
not want to assume any axioms concerning geometric notions such as area), and 
not all functions f are integrable. It turns out that there are at least two ways to 
define this integral: the Riemann integral, named after Georg Riemann (1826-1866), 
which we will do here and which suffices for most applications, and the Lebesgue 
integral, named after Henri Lebesgue (1875-1941), which supercedes the Riemann 
integral and works for a much larger class of functions. The Lebesgue integral will be 
constructed in Chapter 8. There is also the Riemann—Stieltjes integral [ 1 J (x) da(x), 
a generalization of the Riemann integral due to Thomas Stieltjes (1856-1894), which 
we will discuss in Sect. 11.8. 

Our strategy in defining the Riemann integral is as follows. We begin by first 
defining a notion of integration on a very simple class of functions—the piecewise 
constant functions. These functions are quite primitive, but their advantage is that 
integration is very easy for these functions, as is verifying all the usual properties. 
Then, we handle more general functions by approximating them by piecewise con- 
stant functions. 
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11.1 Partitions 


Before we can introduce the concept of an integral, we need to describe how one 
can partition a large interval into smaller intervals. In this chapter, all intervals will 
be bounded intervals (as opposed to the more general intervals defined in Definition 
9.1.1). 


Definition 11.1.1 Let X be a subset of R. We say that X is connected iff X is non- 
empty and the following property is true: whenever x, y are elements in X such that 
x < y, the bounded interval [x, y] is a subset of X (i.e., every number between x 
and y is also in X). 


Remark 11.1.2 Later on, in Section 2.4 we will define a more general notion of 
connectedness, which applies to any metric space. 


Examples 11.1.3 The set [1, 2] is connected, because if x < y both lie in[1, 2], then 
1 <x < y < 2, and so every element between x and y also lies in [1, 2]. A similar 
argument shows that the set (1, 2) is connected. However, the set [1, 2] U [3, 4] is not 
connected (why?). The real line is connected (why?). All singleton sets such as {3} 
are connected, but for rather trivial reasons (these sets do not contain two elements 
x, y for which x < y). 


Lemma 11.1.4 Let X be anon-empty subset of the real line. Then the following two 
statements are logically equivalent: 


(a) X is bounded and connected. 
(b) X is a bounded interval. 


Proof See Exercise 11.1.1. 


Remark 11.1.5 Recall that intervals are allowed to be singleton points (e.g., the 
degenerate interval [2, 2] = {2}), or even the empty set. 


Corollary 11.1.6 Jf I and J are bounded intervals, then the intersection IN J is 
also a bounded interval. 


Proof See Exercise 11.1.2. 


Example 11.1.7 The intersection of the bounded intervals [2, 4] and [4, 6] is {4}, 
which is also a bounded interval. The intersection of (2, 4) and (4, 6) is J. 


We now give each bounded interval a length. 


Definition 11.1.8 (Length of intervals) If I is a bounded interval, we define the 
length of I, denoted |/| as follows. If J is one of the intervals [a, b], (a, b), [a, b), 
or (a, b] for some real numbers a < b, then we define |7| := b — a. Otherwise, if 7 
is a point or the empty set, we define |/| = 0. 
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Example 11.1.9 For instance, the length of [3,5] is 2, as is the length of (3, 5); 
meanwhile, the length of {5} or the empty set is 0. 


Definition 11.1.10 (Partitions) Let I be a bounded interval. A partition of I is a 
finite set P of bounded intervals contained in J, such that every x in J lies in exactly 
one of the bounded intervals J in P. 


Remark 11.1.11 Note that a partition is a set of intervals, while each interval is itself 
a set of real numbers. Thus a partition is a set consisting of other sets. 


Examples 11.1.12 The set P = {{1}, C1, 3), [3, 5), {5}, (5, 8], 0} of bounded inter- 
vals is a partition of [1, 8], because all the intervals in P lie in [1, 8], and each element 
of [1, 8] lies in exactly one interval in P. Note that one could have removed the empty 
set from P and still obtain a partition. However, the set {[1, 4], [3, 5]} is not a parti- 
tion of [1, 5] because some elements of [1, 5] are included in more than one interval 
in the set. The set {(1, 3), (3, 5)} is not a partition of (1, 5) because some elements 
of (1,5) are not included in any interval in the set. The set {(0, 3), [3, 5)} is not a 
partition of (1,5) because some intervals in the set are not contained in (1, 5). 


Now we come to a basic property about length: 


Theorem 11.1.13 (Length is finitely additive) Let I be a bounded interval, n be a 
natural number, and let P be a partition of I of cardinality n. Then 


= OVI. 


JeP 


Proof We prove this by induction on n. More precisely, we let P(n) be the property 
that whenever J is a bounded interval, and whenever P is a partition of 7 with 
cardinality n, that |7| = >> ;<p | JI. 

The base case P (0) is trivial; the only way that J can be partitioned into an empty 
partition is if J is itself empty (why?), at which point the claim is easy. The case 
P(1) is also very easy; the only way that J can be partitioned into a singleton set {J} 
is if J = I (why?), at which point the claim is again very easy. 

Now suppose inductively that P(n) is true for some n > 1, and now we prove 
P(n+ 1). Let I be a bounded interval, and let P be a partition of 7 of cardinality 
n+l. 

If J is the empty set or a point, then all the intervals in P must also be either the 
empty set or a point (why?), and so every interval has length zero and the claim is 
trivial. Thus we will assume that J is an interval of the form (a, b), (a, b], [a, b), or 
[a, b]. 

Let us first suppose that b € J, 1.e., J is either (a, b] or [a, b]. Since b € I, we 
know that one of the intervals K in P contains b. Since K is contained in J, it must 
therefore be of the form (c, b], [c, b], or {b} for some real number c, witha <c <b 
(in the latter case of K = {b}, we set c := b). In particular, this means that the set 
I — K is also an interval of the form [a, c], (a,c), (a, c], [a, c) when c > a, ora 
point or empty set when a = c. Either way, we easily see that 
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[T] = |K|+|1— K\. 


On the other hand, since P forms a partition of 7, we see that P — {K} forms a 
partition of J — K (why?). By the induction hypothesis, we thus have 


J-Kl= Yo Vi 


JeP—-{K} 


Combining these two identities (and using the laws of addition for finite sets, see 
Proposition 7.1.11) we obtain 
eee 


JeP 


as desired. 

Now suppose that b ¢ I, i.e., I is either (a, b) or [a, b). Then one of the intervals 
K also is of the form (c, b) or [c, b) (see Exercise 11.1.3). In particular, this means 
that the set J — K is also an interval of the form [a, c], (a,c), (a, c], [a, c) when 
c > a, or a point or empty set when a = c. The rest of the argument then proceeds 
as above. 


There are two more things we need to do with partitions. One is to say when one 
partition is finer than another, and the other is to talk about the common refinement 
of two partitions. 


Definition 11.1.14 (Finer and coarser partitions) Let I be a bounded interval, and 
let P and P’ be two partitions of 7. We say that P’ is finer than P (or equivalently, 
that P is coarser than P’) if for every J in P’, there exists a K in P such that J C K. 


Example 11.1.15 The partition {[1, 2), {2}, (2, 3), [3, 4]} is finer than {[1, 2], (2, 4]} 
(why?). Both partitions are finer than {[1, 4]}, which is the coarsest possible partition 
of [1, 4]. Note that there is no such thing as a “finest” partition of [1, 4]. (Why? recall 
all partitions are assumed to be finite.) We do not compare partitions of different 
intervals, for instance if P is a partition of [1, 4] and P’ is a partition of [2, 5] then 
we would not say that P is coarser or finer than P’. 


Definition 11.1.16 (Common refinement) Let I be a bounded interval, and let P and 
P’ be two partitions of 1. We define the common refinement P#P’ of P and P’ to be 
the set 

P#P’ :-= {KO J:K ¢PandJ c€P'. 


Example 11.1.17 LetP := {[1, 3), [3, 4]} and P’ := {[1, 2], (2, 4]} be two partitions 
of [1, 4]. Then P#P’ is the set {[1, 2], (2, 3), [3, 4], 0} (why?). 


Lemma 11.1.18 Let I be a bounded interval, and let P and P' be two partitions of 
I. Then P#P" is also a partition of I, and is both finer than P. and finer than P’. 


Proof See Exercise 11.1.4. 
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— Exercises — 


Exercise 11.1.1 Prove Lemma 11.1.4. (Hint: in order to show that (a) implies (b) in the case when 
X is non-empty, consider the supremum and infimum of X.) 


Exercise 11.1.2. Prove Corollary 11.1.6. (Hint: use Lemma 11.1.4, and explain why the intersection 
of two bounded sets is automatically bounded, and why the intersection of two connected sets is 
automatically connected.) 


Exercise 11.1.3 Let I be a bounded interval of the form J = (a, b) or J = [a, b) for some real 
numbers a < b. Let lj,..., J, be a partition of J. Prove that one of the intervals J; in this partition 
is of the form J; = (c, b) or Ij = [c, b) for some a < c < b. (Hint: prove by contradiction. First 
show that if 7; is not of the form (c, b) or [c, b) for any a < c < b, then sup /; is strictly less than 
b.) 


Exercise 11.1.4 Prove Lemma 11.1.18. 


11.2 Piecewise Constant Functions 


We can now describe the class of “simple” functions which we can integrate very 
easily. 


Definition 11.2.1 (Constant functions) Let X be a subset of R, and let f: X > R 
be a function. We say that f is constant iff there exists a real number c such that 
f(x) =c for all x € X. If E is a subset of X, we say that f is constant on E if the 
restriction f|¢ of f to E is constant, in other words there exists a real number c such 
that f(x) = c for all x € E. We refer to c as the constant value of f on E. 


Remark 11.2.2 If E is a non-empty set, then a function f which is constant on E 
can have only one constant value; it is not possible for a function to always equal 3 
on E while simultaneously always equalling 4. However, if E is empty, every real 
number c is a constant value for f on E (why?). 


Definition 11.2.3 (Piecewise constant functions I) Let I be a bounded interval, let 
f: I — R be a function, and let P be a partition of 7. We say that f is piecewise 
constant with respect to P if for every J € P, f is constant on J. 


Example 11.2.4 The function f: [1,6] — R defined by 


ifl<x <3 
ifx =3 
if3<x <6 
ifx =6 


f(x) = 


NumsBN 


is piecewise constant with respect to the partition {[1, 3), {3}, (3, 6), {6}} of [1, 6]. 
Note that it is also piecewise constant with respect to some other partitions as well; 
for instance, it is piecewise constant with respect to the partition {[1, 2), {2}, (2, 3), 


{3}, (3, 5), [5, 6), {6}, B}. 
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Definition 11.2.5 (Piecewise constant functions II) Let I be a bounded interval, and 
let f: I — R bea function. We say that f is piecewise constant on I if there exists 
a partition P of J such that f is piecewise constant with respect to P. 


Example 11.2.6 The function used in the previous example is piecewise constant 
on [1, 6]. Also, every constant function on a bounded interval J is automatically 
piecewise constant (why?). 


Lemma 11.2.7 Let I be a bounded interval, let P be a partition of I, and let f : I > 
R be a function which is piecewise constant with respect to P. Let P’ be a partition 
of I which is finer than P. Then f is also piecewise constant with respect to P’. 


Proof See Exercise 11.2.1. 
The space of piecewise constant functions is closed under algebraic operations: 


Lemma 11.2.8 Let I be a bounded interval, and let f: I > Rand g: I > R be 
piecewise constant functions on I. Then the functions f + g, f — g, max(f, g) and 
fg are also piecewise constant functions on I. Here of course max(f, g): 1 > R 
is the function max(f, g)(x) := max(f (x), g(x)). If g does not vanish anywhere on 
I (i.e., g(x) #O forall x € I), then f/g is also a piecewise constant function on I. 


Proof See Exercise 11.2.2. 


We are now ready to integrate piecewise constant functions. We begin with a 
temporary definition of an integral with respect to a partition. 


Definition 11.2.9 (Piecewise constant integral I) Let I be a bounded interval, let P 
be a partition of 7. Let f: J — R be a function which is piecewise constant with 
respect to P. Then we define the piecewise constant integral p.c. Sey f of f with 
respect to the partition P by the formula 


pe. f fi cs|JI, 
[P] 2 


JeP 
where for each J in P, we let c; be the constant value of f on J. 


Remark 11.2.10 This definition seems like it could be ill-defined, because if J is 
empty then every number c, can be the constant value of f on J, but fortunately in 
such cases |J| is zero and so the choice of c; is irrelevant. The notation p.c. Sey f 
is rather artificial, but we shall only need it temporarily, en route to a more useful 
definition. Note that since P is finite, the sum }7,-p c;|J| is always well-defined (it 
is never divergent or infinite). 


Remark 11.2.11 The piecewise constant integral corresponds intuitively to one’s 
notion of area, given that the area of a rectangle ought to be the product of the 
lengths of the sides. (Of course, if f is negative somewhere, then the “area” c;|J| 
would also be negative.) 
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Example 11.2.12 Let f : [1,4] — R be the function 
2 ifl<x <3 
fx) = 744 ifx =3 
6 if3 <x <4 


and let P := {[1, 3), {3}, (3, 4]}. Then 


pe. f Ff =ct,ayI[1, 3)| + ey H{3}1 + cg,4 1G, 4] 
[P] 


=2x2+4x0+6x1 
= 10. 


Alternatively, if we let P’ := {[1, 2), [2, 3), {3}, G, 4], O} then 


p.c. SF =ctiyI[1, 2) + e72,3) 12, 3)| + psy {3} 
[P’] 


+ ¢@,4|@, 4]| + cg|A| 
=2x14+2x14+4x0+6x1l+cx0O 
= 10. 


This example suggests that this integral does not really depend on what partition 
you pick, so long as your function is piecewise constant with respect to that partition. 
That is indeed true: 


Proposition 11.2.13 (Piecewise constant integral is independent of partition) Let I 
be a bounded interval, and let f : I + R be a function. Suppose that P and P’ are 
partitions of I such that f is piecewise constant both with respect to P and with 


respect to P’. Then pc. fp f= pc. frpn f. 
Proof See Exercise 11.2.3. 


Because of this proposition, we can now make the following definition: 


Definition 11.2.14 (Piecewise constant integral II) Let I be a bounded interval, 
and let f: I — R be a piecewise constant function on 7. We define the piecewise 
constant integral p.c. { , J by the formula 


pe. ff = refi ae 


where P is any partition of 7 with respect to which f is piecewise constant. (Note 
that Proposition 11.2.13 tells us that the precise choice of this partition is irrelevant.) 


Example 11.2.15 If f is the function given in Example 11.2.12, then p.c. te 4] f= 
10. 
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We now give some basic properties of the piecewise constant integral. These laws 
will eventually be superceded by the corresponding laws for the Riemann integral 
(Theorem 11.4.1). 


Theorem 11.2.16 (Laws of integration) Let I be a bounded interval, and let f : I > 
Rand g: I — R be piecewise constant functions on I. 


(a) We have pc. f (f +8) a Die ft pec, g. 
(b) For any real number c, we have p.c. fe (cf) =c(p.c. te Sf). 
(c) We have Pei = = pel, f- pei f,&: 
(d) If f(x) = Oforallx €T, then Pied, f>=0. 
(e) If f(x) = g(x) forall x € I, then peed, f= pees, g. 
(f) If f is the constant function f (x) = c for all x in I, then p.c. Si f=cll- 
(g) Let J be a bounded interval containing I (i.e., I C J), and let F: J > R be 
the function 
__ |) fQ@) fxel 
F(x) := {i wedi 


Then F is piecewise constant on J, and p.c. ty F= p.c. [? f. 

(h) Suppose that {J, K} is a partition of I into two intervals J and K. Then the 
functions f\7 : J > Rand f|x : K — Rare piecewise constant on J and K 
respectively, and we have 


pef t=pe.f fltpe | fle. 


Proof See Exercise 11.2.4. 


This concludes our integration of piecewise constant functions. We now turn to 
the question of how to integrate bounded functions. 


— Exercises — 


Exercise 11.2.1 Prove Lemma 11.2.7. 


Exercise 11.2.2. Prove Lemma 11.2.8. (Hint: use Lemmas 11.1.18 and 11.2.7 to make f and g 
piecewise constant with respect to the same partition of J.) 


Exercise 11.2.3 Prove Proposition 11.2.13. (Hint: first use Theorem 11.1.13 to show that both 
integrals are equal to p.c. Sips f.) 


Exercise 11.2.4. Prove Theorem 11.2.16. (Hint: you can use earlier parts of the theorem to prove 
some of the later parts of the theorem. See also the hint to Exercise 11.2.2.) 


11.3. Upper and Lower Riemann Integrals 


Now let f: J — R be a bounded function defined on a bounded interval 7. We want 
to define the Riemann integral [ , /- To do this we first need to define the notion of 
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upper and lower Riemann integrals f , Jf and f : jf. These notions are related to the 
Riemann integral in much the same way that the lim sup and lim inf of a sequence 
are related to the limit of that sequence. 


Definition 11.3.1 (Majorization of functions) Let f: I > Randg: I > R.Wesay 
that g majorizes f on I if we have g(x) > f(x) for all x € J, and that g minorizes 
f on Tif g(x) < f(x) forallx € J. 


The idea of the Riemann integral is to try to integrate a function by first majorizing 
or minorizing that function by a piecewise constant function (which we already know 
how to integrate). 


Definition 11.3.2. (Upper and lower Riemann integrals) Let f : I + Rbe abounded 
function defined on a bounded interval J. We define the upper Riemann integral { St 
by the formula 


/ f := inf{p.c. / g:gisap.c. function on J which majorizes f} 
I I 


and the lower Riemann integral { F f by the formula 


/ f := sup{p.c. / g:gisap.c. function on J which minorizes /}. 
wey, I 


We give a crude but useful bound on the lower and upper integral: 


Lemma 11.3.3 Let f: I > R be a function on a bounded interval I which is 
bounded by some real number M, i.e., —M < f(x) < M for all x € I. Then we 


have _ 
-mins | | f <MIll. 
voy E 


In particular, both the lower and upper Riemann integrals are real numbers (i.e., 
they are not infinite). 


Proof The function g: J — R defined by g(x) = M is constant, hence piecewise 
constant, and majorizes f; thus f rf XS pec. ug 7 & = M|I| by definition of the upper 
Riemann integral. A similar argument gives —M|1| < [ , jf. Finally, we have to show 


that f P f< Ve f. Let g be any piecewise constant function majorizing f, and let h 
be any piecewise constant function minorizing f. Then g majorizes h, and hence 
p.c. f,h < p.c. f, g. Taking suprema in h, we obtain that ae < p.c. f, g. Taking 


infima in g, we thus obtain If < ie f, as desired. 


We now know that the upper Riemann integral is always at least as large as the 
lower Riemann integral. If the two integrals match, then we can define the Riemann 
integral: 
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Definition 11.3.4 (Riemann integral) Let f: I — R be a bounded function on a 
bounded interval /. If [ ee J, f, then we say that f is Riemann integrable on I 


and define = 
fr-fr-Ju 


If the upper and lower Riemann integrals are unequal, we say that f is not Riemann 
integrable. 


Remark 11.3.5 Compare this definition to the relationship between the lim sup, lim 
inf, and limit of a sequence a, that was established in Proposition 6.4.12(f); the lim 
sup is always greater than or equal to the lim inf, but they are only equal when the 
sequence converges, and in this case they are both equal to the limit of the sequence. 
The definition given above may differ from the definition you may have encountered 
in your calculus courses, based on Riemann sums. However, the two definitions turn 
out to be equivalent; this is the purpose of the next section. 


Remark 11.3.6 Note that we do not consider unbounded functions to be Riemann 
integrable; an integral involving such functions is known as an improper integral. It is 
possible to still evaluate such integrals using more sophisticated integration methods 
(such as the Lebesgue integral); we shall do this in Chap. 8. 


The Riemann integral is consistent with (and supercedes) the piecewise constant 
integral: 


Lemma 11.3.7 Let f: I — R be apiecewise constant function on a bounded inter- 
val I. Then f is Riemann integrable, and 1 f= p.c. J, f. 


Proof See Exercise 11.3.3. 


Remark 11.3.8 Because of this lemma, we will not refer to the piecewise constant 
integral p.c. f, ,; again, and just use the Riemann integral { , throughout (until this 
integral is itself superceded by the Lebesgue integral in Chapter 8). We observe one 
special case of Lemma 11.3.7: if J is a point or the empty set, then /, , J = 0 for all 
functions f: J > R. (Note that all such functions are automatically constant.) 


We have just shown that every piecewise constant function is Riemann integrable. 
However, the Riemann integral is more general and can integrate a wider class of 
functions; we shall see this shortly. For now, we connect the Riemann integral we 
have just defined to the concept of a Riemann sum, which you may have seen in other 
treatments of the Riemann integral. 


Definition 11.3.9 (Riemann sums) Let f: I + R be a bounded function on a 
bounded interval 7, and let P be a partition of 7. We define the upper Riemann 
sum U (f, P) and the lower Riemann sum L(f, P) by 


UCf,P) = D> (sup f(@))IJ| 


JEP: zh *EF 
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and 


L(f.P) = D7 (inf f@))| JI. 


JEP ILD 


Remark 11.3.10 Therestriction J 4 Wis required because the quantities inf,< 7 f(x) 
and sup,., f(x) are infinite (or negative infinite) if J is empty. 


We now connect these Riemann sums to the upper and lower Riemann integral. 


Lemma 11.3.11 Let f : I — R bea bounded function on a bounded interval I, and 
let g be a function which majorizes f and which is piecewise constant with respect 
to some partition P of I. Then 


pe. fg = U(f,P). 


Similarly, if h is a function which minorizes f and is piecewise constant with respect 
to P, then 


pe. ft < L(f,P). 
I 


Proof See Exercise 11.3.4. 


Proposition 11.3.12 Let f: I — R be a bounded function on a bounded interval 
I. Then 


i; f =inf{U(f, P) : P is a partition of T} 
I 


and 


/ f =sup{L(f, P) : P is a partition of T} 
J] 


Proof See Exercise 11.3.5. 


— Exercises — 


Exercise 11.3.1 Let f: I > R,g: I > R,andh: I — R be functions. Show that if f majorizes 
g and g majorizes h, then f majorizes h. Show that if f and g majorize each other, then they must 
be equal. 


Exercise 11.3.2 Let f: I > R, g: I > R, and h: I — R be functions. If f majorizes g, is it 
true that f + h majorizes g + h? Is it true that f - h majorizes g - h? If c is a real number, is it true 
that cf majorizes cg? 


Exercise 11.3.3. Prove Lemma 11.3.7. 
Exercise 11.3.4 Prove Lemma 11.3.11. 


Exercise 11.3.5 Prove Proposition 11.3.12. (Hint: you will need Lemma 11.3.11, even though this 
Lemma will only do half of the job.) 
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11.4 Basic Properties of the Riemann Integral 


Just as we did with limits, series, and derivatives, we now give the basic laws for 
manipulating the Riemann integral. These laws will eventually be superceded by the 
corresponding laws for the Lebesgue integral (Proposition 8.3.3). 


Theorem 11.4.1 (Laws of Riemann integration) Let I be a bounded interval, and 
let f: I > Rand g: I > R be Riemann integrable functions on I. 


(a 


SS 


The function f + g is Riemann integrable, and we have ip (f+g)= iF f+ 


S18: 


(b) For any real number c, the function cf is Riemann integrable, and we have 
S (cf) = eV, f)- 
(c) The function f — g is Riemann integrable, and we have i (f-g= i f- 


Sr 8: 


(d) If f(x) =O forall x € I, then f, f = 0. 
(e) If f(x) = g(x) forallx €T, then f, f > Le 


(f) If f is the constant function f (x) = c for all x in I, then i f=cell- 
(g) Let J be a bounded interval containing I (i.e., I C J), and let F: J > R be 
the function 
_ | f@) feel 
F(x) := le eet 


Then F is Riemann integrable on J, and fi Fea iF fF: 
Suppose that {J, K} is a partition of I into two intervals J and K. Then the 
functions f\; : J — Rand f|x : K — Rare Riemann integrable on J and K, 


respectively, and we have 
[raf sus f ne. 
I J K 


Remark 11.4.2. We often abbreviate [ , flag as r , f, even though f is really defined 
on a larger domain than just J. We also observe from Theorem | 1.4.1(h) and Remark 
11.3.8 that if f: [a,b] — R is Riemann integrable on a closed interval [a, b], then 


dats f = Sees ‘i = Sina) Z = Seat f. 


Theorem 11.4.1 asserts that the sum or difference of any two Riemann integrable 
functions is Riemann integrable, as is any scalar multiple cf of a Riemann integrable 
function f. We now give some further ways to create Riemann integrable functions. 


(h 


SS 


Proof See Exercise 11.4.1. 


Theorem 11.4.3. (Max and min preserve integrability) Let I be a bounded inter- 
val, and let f: I — Rand g: I — R be a Riemann integrable function. Then the 
functions max(f, g): I — R and min(f, g) : I > R defined by max(f, g)(x) := 
max( f(x), g(x)) andmin(f, g)(x) := min(f (x), g(x)) are also Riemann integrable. 
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Proof We shall just prove the claim for max(f, g), the case of min(f, g) being 


similar. First note that since f and g are bounded, then max(/f, g) is also bounded. 
Let ¢ > 0. Since f, f= te ae , there exists a piecewise constant function f : [ > 


R which minorizes f on J such that 


fez [o- 


Similarly we can find a piecewise constant g : J — R which minorizes g on J such 


that 
[sz [o- 
I~ I 


and we can find piecewise functions f, Z which majorize f, g respectively on J such 


that 
[Fs frre 
[es fere 


In particular, if h: J — R denotes the function 


h:=(F-f)+@-g) 


[use 
I 


On the other hand, max( E- ; 8) is a piecewise constant function on J (why?) which 


we have 


minorizes max(f, g) (why?), while max(f, 2) is similarly a piecewise constant func- 
tion on J which majorizes max(f, g). Thus 


[mactie s | maxcfied< f max(f.e < f max, 
Ra 
and so 
0< / vmax(f 8) — i: max(f, g) < / max(F, Z) — max(f, g). 
ay | 


But we have 


Fx) = f@) + F-AO < FO) +h) 
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and similarly 
B(x) = g(x) + & — 8)(*) S g(x) +h) 


and thus _ 
max(f (x), g(x)) < max(f(x), g(x)) + A(x). 


Inserting this into the previous inequality, we obtain 


0< / max(f, 2) — i max(f, 2) < / h <4e. 
I Joy I 


To summarize, we have shown that 


/ max(f, g) < 4e 


ie f 


o</ max(f, g) — 
1 


for every ¢. Since ie max(f, g) — J max(f, g) does not depend on «, we thus see 
that 7 


i, _max(f, 8) — i max(f, g) = 0 
ir 


and hence that max(f, g) is Riemann integrable. 


Corollary 11.4.4 (Absolute values preserve Riemann integrability) Let I be a 
bounded interval. If f : I — R is a Riemann integrable function, then the positive 
part f, := max(f, 0) and the negative part f_- := min(f, 0) are also Riemann inte- 
grable on I. Also, the absolute value | f | defined by | f |(x) := | f (x)| is also Riemann 
integrable on I. (This latter claim follows from the observation that | f| = f4 — f_.) 


Theorem 11.4.5 (Products preserve Riemann integrability) Let I be a bounded 
interval. If f: I > R and g: I > R are Riemann integrable, then fg: I> R 
is also Riemann integrable. 


Proof This one is a little trickier. We split f = f, + f- and g=g,+_ into 
positive and negative parts; by Corollary 11.4.4, the functions f,, f_, g4, g_ are 
Riemann integrable. Since 


fg = fait feg-t+ fait fs 


then it suffices to show that the functions f,94,f,9-, f-g+, f-g_ are individually 
Riemann integrable. We will just show this for f+; the other three are similar. 
Since f; and g, are bounded and positive, there are M,, Mz > O such that 


O< fa@) < M andO < gy(x) < M 
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for all x € I. Now let e > 0 be arbitrary. Then, as in the proof of Theorem 11.4.3, 
we can find a piecewise constant function f; minorizing f on /, and a piecewise 


constant function f, majorizing f, on J, such that 


[Rs [tte 
[tz fae 


Note that f, may be negative at places, but we can fix this by replacing f, by 
max(f,,0), since this still minorizes f, (why?) and still has integral greater than 
or equal to f , f+ — € (why?). So without loss of generality we may assume that 
f(x) = 0 for all x € 7. Similarly we may assume that f,(x) < M, for all x € J; 
thus 


and 


0<f@)<A@<A@ <M 


for all x € I. 
Similar reasoning allows us to find piecewise constant g, minorizing g,, and g+ 


majorizing g,, such that 
: aS i fee 
I I 


and 
ese fore 
I I 
and 
0 < g4(*) < 84) < 8+) <M 
for all x € I. 


Notice that f, 4 is piecewise constant and minorizes f,g4, while f, 2; is piece- 
wise constant and majorizes f;g,. Thus 


o<f he =f te = [ Fee free. 


However, we have 


FOB) — freee) = FOO — 84) @) + BOF — AO) 


< Mi (@q— gs)(x) + Ma(fe — fr) 


for all x € J, and thus 
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o<f fe—f feer <M | Gen) M2 | Fe - fv 
Yy 


< M,(2e) + M3(2e). 


Again, since ¢ was arbitrary, we can conclude that f,g, is Riemann integrable, 
as before. Similar argument show that f, 9, f-g,, f-g— are Riemann integrable; 
combining them we obtain that fg is Riemann integrable. 


— Exercises — 


Exercise 11.4.1 Prove Theorem 11.4.1. (Hint: you may find Theorem 11.2.16 to be useful. For 
part (b): First do the case c > 0. Then do the case c = —1 and c = 0 separately. Using these cases, 
deduce the case of c < 0. You can use earlier parts of the theorem to prove later ones.) 


Exercise 11.4.2 Let I be a bounded interval, let f: J — R be a Riemann integrable function, and 


let P be a partition of 7. Show that 
= 
J 


JeP 


Exercise 11.4.3 Without repeating all the computations in the above proofs, give a short explanation 
as to why the remaining cases of Theorem 11.4.3 and Theorem 11.4.5 follow automatically from the 
cases presented in the text. (Hint: from Theorem 11.4.1 we know that if f is Riemann integrable, 
then so is —f.) 


11.5 Riemann Integrability of Continuous Functions 


We have already said a lot about Riemann integrable functions so far, but we have 
not yet actually produced any such functions other than the piecewise constant ones. 
Now we rectify this by showing that a large class of useful functions are Riemann 
integrable. We begin with the uniformly continuous functions. 


Theorem 11.5.1 Let I be a bounded interval, and let f be a function which is 
uniformly continuous on I. Then f is Riemann integrable. 


Proof From Proposition 9.9.15 we see that f is bounded. Now we have to show that 
[fat ,f 

If J is a point or the empty set then the theorem is trivial, so let us assume that J is 
one of the four intervals [a, b], (a, b), (a, b], or [a, b) for some real numbers a < b. 

Let ¢ > 0 be arbitrary. By uniform continuity, there exists a 6 > 0 such that 
| f(x) — f()| < €wheneverx, y € J aresuch that |x — y| < 6. By the Archimedean 
principle, there exists an integer N > 0 such that (b —a)/N <6. 

Note that we can partition J into N intervals Jj,..., Jy, each of length (b — 
a)/N. (How? One has to treat each of the cases [a, b], (a, b), (a, b], [a, b) slightly 
differently.) By Proposition 11.3.12, we thus have 
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N 
f < d\(sup f(@)) | Jil 
T kal *EK 


IV 


f 


and 
N 
[f= Xcint roo 
= k=l 


so in particular 
= N 
J r-f £ = Low seo - int root. 
I Joy ka Xe xe Ji 


However, we have | f(x) — f(y)| < ¢forallx, y € Jz, since || = (b —a)/N <6. 
In particular we have 


f(x) < fO) +e forall x, y € j. 
Taking suprema in x, we obtain 


sup f(x) < f(y) +e forall y € i, 


xed 


and then taking infima in y we obtain 


sup f(x) < ant fQ) +e. 


xESE 


Inserting this bound into our previous inequality, we obtain 


[r-frsdoaa 


but by Theorem 11.1.13 we thus have 


Rian = (b= a). 


But e > 0 was arbitrary, while (b — a) is fixed. Thus f = + F f cannot be positive. 
By Lemma 11.3.3 and the definition of Riemann integrability we thus have that f is 
Riemann integrable. 


Combining Theorem 11.5.1 with Theorem 9.9.16, we thus obtain 
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Corollary 11.5.2 Let [a, b] be a closed interval, and let f : [a,b] — R be contin- 
uous. Then f is Riemann integrable. 


Note that this Corollary is not true if [a, b] is replaced by any other sort of interval, 
since it is not even guaranteed then that continuous functions are bounded. For 
instance, the function f: (0, 1) — R defined by f(x) := 1/x is continuous but not 
Riemann integrable. However, if we assume that a function is both continuous and 
bounded, we can recover Riemann integrability: 


Proposition 11.5.3. Let I be a bounded interval, and let f : I — R be both contin- 
uous and bounded. Then f is Riemann integrable on I. 


Proof If I is a point or an empty set then the claim is trivial; if J is a closed interval 
the claim follows from Corollary 11.5.2. So let us assume that J is of the form (a, b], 
(a, b), or [a, b) for some a < b. 

We have a bound M for f, so that —M < f(x) < M for all x € J. Now let 
0 < €e < (b — a)/2 be asmall number. The function f when restricted to the interval 
[a + e, b — €] is continuous, and hence Riemann integrable by Corollary 11.5.2. In 
particular, we can find a piecewise constant function h: [a + ¢, b — ¢] — R which 
majorizes f on [a + €, b — €] such that 


7 he< if fe. 
[a+e,b—e] [a+e,b—e] 


Define h: I > R by 


~ .  Jh(x) ifx €late,b— 6] 
ne ie ifx €I\la+e,b—] 


Clearly his piecewise constant on J and majorizes f; by Theorem 11.2.16 we have 


fiqems [ h+em< [ f+(2M + le. 
I [ate,b—e] [a+e,b—e«] 


In particular we have 


ze) f+ (QM + le. 
I [ate,b—e] 


A similar argument gives 


fae) f—QM +4 Ie 
Jy [a+e,b—e] 


and hence 
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[p-fpsqmrre 


But e is arbitrary, and so we can argue as in the proof of Theorem 11.5.1 to conclude 
Riemann integrability. 


This gives a large class of Riemann integrable functions already; the bounded 
continuous functions. But we can expand this class a little more, to include the 
bounded piecewise continuous functions. 


Definition 11.5.4 Let J be a bounded interval, and let f: J — R. We say that f is 
piecewise continuous on I iff there exists a partition P of J such that f|, is continuous 
on J forall J € P. 


Example 11.5.5 The function f: [1,3] — R defined by 


x? ifl<x <2 
F(x):= 47 ifx=2 
x if2<x<3 


is not continuous on [1, 3], but it is piecewise continuous on [1, 3] (since it is con- 
tinuous when restricted to [1, 2) or {2} or (2, 3], and those three intervals partition 


[1, 3]). 


Proposition 11.5.6 Let I be a bounded interval, and let f : I > R be both piece- 
wise continuous and bounded. Then f is Riemann integrable. 


Proof See Exercise 11.5.1. 


— Exercises — 
Exercise 11.5.1 Prove Proposition 11.5.6. (Hint: use Theorem 11.4.1(a) and (g).) 


Exercise 11.5.2. Leta < b be real numbers, and let f: [a,b] — R be a continuous, non-negative 
function (so f(x) > 0 for all x € [a, b]). Suppose that Sia.) f = 0. Show that f(x) = 0 for all 
x € [a, b]. (Hint: argue by contradiction.) 


11.6 Riemann Integrability of Monotone Functions 


In addition to piecewise continuous functions, another wide class of functions is 
Riemann integrable, namely the monotone functions. We give two instances of this: 


Proposition 11.6.1 Let [a, b] be aclosed and bounded interval and let f : [a,b] > 
R be a monotone function. Then f is Riemann integrable on [a, b]. 


Remark 11.6.2 From Exercise 9.8.5 we know that there exist monotone functions 
which are not piecewise continuous, so this proposition is not subsumed by Propo- 
sition 11.5.6. 
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Proof Without loss of generality we may take f to be monotone increasing (instead 
of monotone decreasing). From Exercise 9.8.1 we know that f is bounded. Now let 
N > Obe an integer, and partition [a, b] into N half-open intervals {[a + ba Jat 
bac] +1)):0< j < N-—1} of length (6 —a)/N, together with the point {b}. 
Then by Proposition 11.3.12 we have 


a N-1 b 
ice sup) 
I j xelat+ 2x4 j,at 24 (j+)) 


=0 


(the point {b} clearly giving only a zero contribution). Since f is monotone increas- 


ing, we thus have 
b-a 
1) |} ——. 
)) 2 


Similarly we have 


Thus we have 


Tt li b-a.. b—a \\ b-a 
Pag hi? ee u+p)-r(a+-=4s)) Po" 


Using telescoping series (Lemma 7.2.14) we thus have 


fs -frss (« +°=*m) - f(a+*=*0)) = 


b-a 
Sf O)= fa): 


But N was arbitrary, so we can conclude as in the proof of Theorem 11.5.1 that f is 
Riemann integrable. 


Corollary 11.6.3 Let I be a bounded interval, and let f : I — R be both monotone 
and bounded. Then f is Riemann integrable on I. 


Proof See Exercise 11.6.1. 


We now give the famous integral test for determining convergence of monotone 
decreasing series. 
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Proposition 11.6.4 (Integral test) Let f: [0,0o) — R be a monotone decreas- 
ing function which is non-negative (i.e., f(x) > 0 for all x > 0). Then the sum 


yg £ (1) is convergent if and only if sup yo Sion f is finite. 
Proof See Exercise 11.6.3. 


Corollary 11.6.5 Let p be a real number. Then > , + converges absolutely when 
Pp > land diverges when p < 1. 


Proof See Exercise 11.6.5. 


— Exercises — 


Exercise 11.6.1 Use Proposition 11.6.1 to prove Corollary 11.6.3. (Hint: adapt the proof of Propo- 
sition 11.5.3.) 


Exercise 11.6.2 Formulate a reasonable notion of a piecewise monotone function, and then show 
that all bounded piecewise monotone functions are Riemann integrable. 


N 
n=1 


Exercise 11.6.3 Prove Proposition 11.6.4. (Hint: what is the relationship between the sum >> 
Ff (n), the sum ~ f (n), and the integral Sto.n1 Ff?) 


Exercise 11.6.4 Give examples to show that both directions of the integral test break down if f is 
not assumed to be monotone decreasing. 


Exercise 11.6.5 Use Proposition 11.6.4 to prove Corollary 11.6.5. (For this exercise, you may use 
the second Fundamental Theorem of Calculus (Theorem 11.9.4); there is no circularity, because 
Corollary 11.6.5 is not used in the proof of that theorem.) 


11.7. A Non-riemann Integrable Function 


We have shown that there are large classes of bounded functions which are Riemann 
integrable. Unfortunately, there do exist bounded functions which are not Riemann 
integrable: 


Proposition 11.7.1 Let f : [0,1] > R be the discontinuous function 


__ Jl fxeQ 
sad= | ifx ¢Q 


considered in Example 9.3.21. Then f is bounded but not Riemann integrable. 


Proof Itis clear that f is bounded, so let us show that it is not Riemann integrable. 
Let P be any partition of [0, 1]. For any J € P, observe that if J is not a point or 
the empty set, then 


sup f(x) = 1 


xeJ 


(by Proposition 5.4.14). In particular we have 
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(se fe) vi = lJ. 


(Note this is also true when J is a point, since both sides are zero.) In particular we 
see that 
UG Ph= > W=\oS1 


JEPIZD 


by Theorem 1 1.1.13; note that the empty set does not contribute anything to the total 
length. In particular we have Son f = 1, by Proposition 11.3.12. 
A similar argument gives that 


inf f(x) =0 
xed 
for all J (other than points or the empty set), and so 


HERS > 00. 


TEPID 


In particular we have = 0, by Proposition 11.3.12. Thus the upper and lower 
p (0.1) y Frop Pp 


Riemann integrals do not match, and so this function is not Riemann integrable. 


Remark 11.7.2, As youcan see, it is only rather “artificial” bounded functions which 
are not Riemann integrable. Because of this, the Riemann integral is good enough 
for a large majority of cases. There are ways to generalize or improve this integral, 
though. One of these is the Lebesgue integral, which we will define in Chapter 8. 
Another is the Riemann-Stieltjes integral , f da, where a : J + R is a monotone 
increasing function, which we define in the next section. 


11.8 The Riemann-Stieltjes Integral 


Let J be a bounded interval, let a : J — R be a monotone increasing function, and 
let f: I — R bea function. Then there is a generalization of the Riemann integral, 
known as the Riemann-Stieltjes integral. This integral is defined just like the Riemann 
integral, but with one twist: instead of taking the length |J/| of intervals J, we take 
the a-length a[J], defined as follows. 


Definition 11.8.1 (a-length)| Let J be a bounded interval, let X be a interval that 
is closed (in the sense of Definition 9.1.15) containing J, and let a: X > R bea 
monotone increasing function (i.e., ~@(y) > a(x) whenever x, y € X are such that 
y > x). Then we define the a-length a[I] of I by the following rules. 
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Gi) If 7 is empty, then a[/] := 0. 

(ii) If J = {a} is a point, then a[7] := limy—at-xex a(x) — limy+¢-:+ex a(x), with 
the convention that lim,_,¢+:,cx a(x) (resp. limy—.q--.ex @(x)) is equal to a(a) 
when X is the right-endpoint (resp. left-endpoint) of X). 

(iii) If J = (a, Db), set a[ 7] := limy_,p---ex a(x) — lim, spt:xex A(X). 

(iv) If 7 is equal to (a, b], [a, b), or [a, b], set a[7] equal to a((a, b)) + a({b}), 

a({a}) + a((a, b)), or a({a}) + a((a, b)) + a({b}), respectively. 


This definition is complicated, but note that in the special case where a is contin- 
uous, we have the simpler formula 


al] = a(b) — a(a) (11.1) 


whenever a < b and J is equal to (a, b), (a, b], [a, b), or [a, b]. Using this simpli- 
fied formula, one can also define a[/] for other continuous functions that are not 
necessarily monotone increasing. 


Example 11.8.2, Leta: [0, +00) > Rbethe functiona(x) := x”. Thena[[2, 3]] = 
a(3) —a(2) =9-4=5, a[{2}] = 0 and a[4] = 0. 


Example 11.8.3, Leta: R — Rbe the identity function a(x) := x.Thena[/] = |/| 
for all bounded intervals J (why?) Thus the notion of length is a special case of the 
notion of a-length. 


We sometimes write a? or oe) (= instead of a[[a, b]]. 

One of the key theorems for the theory of the Riemann integral was Theorem 
11.1.13, which concerned length and partitions, and in particular showed that |J| = 
> yep || whenever P was a partition of J. We now generalize this slightly. 


Lemma 11.8.4 Let I be a bounded interval, let a : X — R be a monotone increas- 
ing or continuous function defined on some interval X is closed and which contains 
I, and let P be a partition of I. Then we have 


aff] = Yo alJ]. 


Proof See Exercise 11.8.1. 


We can now define a generalization of Definition 11.2.9. 


Definition 11.8.5 (P.c. Riemann-—Stieltjes integral) Let I be a bounded interval, and 
let P be a partition of J. Let a : X — R be a monotone increasing or continuous 
function defined on some interval X whichis closed and contains J, andlet f: 1 > R 
be a function which is piecewise constant with respect to P. Then we define 


c.f fda: [J] 
pe.f al Yo esa 


JeP 


where c, is the constant value of f on J. 
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Example 11.8.6 Let f: [1,3] — R be the function 


’ 


7e fe when x € oo 


2 when x € [2, 


let a: [0, +00) — R be the function a(x) := x*, and let P be the partition P := 
{[1, 2), [2, 3]}. Then 


p.c. f do = cy1,2ye[[1, 2)] + e(2,3)a[[2, 31] 
[P] 


= 4(a(2) — a(1)) + 2(a(3) — a@(2)) = 4x 342x5= 22. 


Example 11.8.7 Let a: R — R be the identity function a(x) := x. Then for any 
bounded interval /, any partition P of J, and any function f that is piecewise constant 
with respect to P, we have p.c. frp, f da = p.c. frp, f (why?). 


We can obtain an exact analogue of Proposition 11.2.13 by replacing all the 
integrals p.c. Se f in the proposition with p.c. Se) Ff da (Exercise 11.8.2). We can 
thus define p.c. f, , / da for any piecewise constant function f: J > R and any 
a: X — R defined on an interval that is closed and contains /, in analogy to before, 


by the formula 
pc. ff da = pc. f f da 
I (P] 


for any partition P on J with respect to which f is piecewise constant. 

Let us now assume that a is monotone increasing. This implies that a(/) > 0 
for all intervals in X (why?). From this one can easily verify that all the results 
from Theorem 11.2.16 continue to hold when the integrals p.c. [, , J are replaced by 
p.c. f f da, and the lengths |/| are replaced by the a-lengths a(/); see Exercise 
ILS. 

We can then define upper and lower Riemann-Stieltjes integrals , J da and 
ri : f da whenever f: J — R is bounded and a is defined on an interval that is 
closed and contains /, by the usual formulae 


i f da := inf{p.c. i gda:gisp.c.on J and majorizes f} 
I I 


and 
/ f da := suplp.c. f eda : gis p.c.on J and minorizes /f}. 
Ca I 


We then say that f is Riemann-—Stieltjes integrable on I with respect to a if the upper 
and lower Riemann-Stieltjes integrals match, in which case we set 
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[fe = [ fae ff fo 


As before, when a is the identity function w(x) := x then the Riemann-Stieltjes 
integral is identical to the Riemann integral; thus the Riemann-Stieltjes integral is a 
generalization of the Riemann integral. (We shall see another comparison between 
the two integrals a little later, in Corollary 11.10.3.) Because of this, we sometimes 
write f, f as [, f dx or f, f(x) dx. 

Most (but not all) of the remaining theory of the Riemann integral then can be 
carried over without difficulty, replacing Riemann integrals with Riemann—Stieltjes 
integrals and lengths with a-lengths. There are a couple results which break down; 
Theorem 11.4.1(g), Proposition 11.5.3, and Proposition 11.5.6 are not necessarily 
true when a is discontinuous at key places (e.g., if f and @ are both discontinuous 
at the same point, then /; , f da is unlikely to be defined). However, Theorem 11.5.1 
is still true (Exercise 11.8.4). 


— Exercises — 
Exercise 11.8.1 Prove Lemma 11.8.4. (Hint: modify the proof of Theorem 11.1.13.) 
Exercise 11.8.2 State and prove a version of Proposition 1 1.2.13 for the Riemann-Stieltjes integral. 
Exercise 11.8.3 State and prove a version of Theorem 1 1.2.16 for the Riemann-Stieltjes integral. 
Exercise 11.8.4 State and prove a version of Theorem 11.5.1 for the Riemann-Stieltjes integral. 
Exercise 11.8.5 Let sgn: R — R be the signum function 
1 whenx >0 
sgen(x):= 70 whenx =0 
—1 when x <0. 


Let f: [—1, 1] > R be a continuous function. Show that f is Riemann-Stieltjes integrable with 
respect to sgn, and that 


/ f dsgn = 2f (0). 
{-1,1] 


(Hint: for every ¢ > 0, find piecewise constant functions majorizing and minorizing f whose 
Riemann-Stieltjes integral is e-close to 2 f (0).) 


11.9 The Two Fundamental Theorems of Calculus 


We now have enough machinery to connect integration and differentiation via the 
familiar fundamental theorem of calculus. Actually, there are two such theorems, 
one involving the derivative of the integral, and the other involving the integral of 
the derivative. 
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Theorem 11.9.1 (First Fundamental Theorem of Calculus) Let a < b be real num- 
bers, and let f : [a,b] — R be a Riemann integrable function. Let F: [a,b] > R 
be the function 


F(x) := fh 


[a,x] 


Then F is continuous. Furthermore, if xo € [a,b] and f is continuous at xo, then F 
is differentiable at x9, and F'(xo) = f (Xo). 


Proof Since f is Riemann integrable, it is bounded (by Definition 11.3.4). Thus we 
have some real number M such that —M < f(x) < M forall x ¢€ [a, b]. 
Now let x < y be two elements of [a, b]. Then notice that 


Fo) - Fa) = [ r-{ far = 
[a,y] [a,x] [x,y] 


by Theorem 11.4.1(h). By Theorem 11.4.1(e) we thus have 


[x,y] [x,y] [x,y] 


i ref -M = pec. | —M =-M(y-x) 
[x,y] [x,y] [x,y] 


lF(y) — F@)| < MQ — x). 


and 


and thus 


This is for y > x. By interchanging x and y we thus see that 
|F(y) — F()| < M(x — y) 


when x > y. Also, we have F(y) — F(x) = 0 when x = y. Thus in all three cases 
we have 
|F(y) — F(x)| < M|x — yl. 


This implies that F is uniformly continuous (in fact it is Lipschitz continuous, see 
Exercise 10.2.6), and hence continuous. 

Now suppose that xo € [a, b], and f is continuous at x9. Choose any ¢ > 0. Then 
by continuity, we can find a 6 > O such that | f(x) — f(xo)| < € for all x in the 
interval J := [xo — 6, x9 + 6] N [a, b], or in other words 


f(x) —€ < f(x) < fo) + ¢ forall x € J. 
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We now show that 
|F(y) — F(x0) — f (xo) (y — X0)| S Ely — Xo 


for all y € J, since Proposition 10.1.7 will then imply that F is differentiable at x9 
with derivative F'’ (xo) = f (xo) as desired. 

Now fix y € /. There are three cases. If y = xo, then F(y) — F (x0) — f(xo)(y — 
Xo) = 0 and so the claim is obvious. If y > xo, then 


FO) — FG) = i f 
[x.y] 


Since x9, y € J, and J is a connected set, then [xo, y] is a subset of 7, and thus we 
have 


f(%o) —€ < f(x) < f(xo) + ¢ for all x € [x0, y], 


and thus 
(f (xo) — €)(y — Xo) S / f < (f(@o) + €)(y — Xo) 


[xo.¥] 


and so in particular 


IF (y) — F(x) — fo) (y — x0) S$ ely — Xo 


as desired. The case y < xq is similar and is left to the reader. 


Example 11.9.2. Recall in Exercise 9.8.5 that we constructed a monotone function 
ff: R— R which was discontinuous at every rational and continuous everywhere 
else. By Proposition 11.6.1, this monotone function is Riemann integrable on [0, 1]. If 
we define F': [0, 1] > Rby F(x) := Sioz1 f, then F is acontinuous function which 
is differentiable at every irrational number. On the other hand, F is non-differentiable 
at every rational number; see Exercise 11.9.1. 


Informally, the first fundamental theorem of calculus asserts that 


(/ J) (x) = f@) 


given a certain number of assumptions on f. Roughly, this means that the derivative 
of an integral recovers the original function. Now we show the reverse, that the 
integral of a derivative recovers the original function. 


Definition 11.9.3 (Antiderivatives) Let I be a bounded interval, and let f: J > R 
be a function. We say that a function F: I > R is an antiderivative of f if F is 
differentiable on J and F'(x) = f(x) for all limit points x of 7. 
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Theorem 11.9.4 (Second Fundamental Theorem of Calculus) Let a < b be real 
numbers, and let f : [a, b] — Rbea Riemann integrable function. If F : [a,b] > R 
is an antiderivative of f, then 


/ f = F(b) — F(a). 
[a,b] 


Proof The claim is trivial for a = b, so suppose that a < b; in particular every point 
in [a, b] is now a limit point. We will use Riemann sums. The idea is to show that 


U(f, P) = F(b) — F@ = L(f, P) 


for every partition P of [a, b]. The left inequality asserts that F(b) — F(a) is a lower 
bound for {U(f, P) : P is a partition of [a, b]}, while the right inequality asserts 
that F(b) — F(a) is an upper bound for {L(f, P) : P is a partition of [a, b]}. But by 
Proposition 11.3.12, this means that 


f= FO-F@= if f 
+ [a,b] 


[a,b] 


but since f is assumed to be Riemann integrable, both the upper and lower Riemann 
integral equal Sie. bI f. The claim follows. 
We have to show the bound U(f, P) > F(b) — F(a) => L(f, P). We shall just 
show the first inequality U(f, P) > F(b) — F(a); the other inequality is similar. 
Let P bea partition of [a, b]. From Lemma 11.8.4 (noting from Proposition 10.1.10 
that F is continuous) we have 


F(b)-F@=) > FWI= >) FL, 


JeP JEP: IFO 


while from definition we have 


UCF.P)= D> sup FIJI. 


JEP: Je *EF 
Thus it will suffice to show that 


F[J] < sup f(x)|JI 


xeJ 


for all J € P (other than the empty set). 

When J is a point then the claim is clear, since both sides are zero. Now suppose 
that J = [c, d], (c, d], [c, d), or (c, d) for some c < d. Then the left-hand side is 
F[J] = F(d) — F(c). (Note that as F is continuous, we may use the simplified 
formula (11.1) for F[J].) By the mean-value theorem, this is equal to (d — c) F’(e) 
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for some e € J. But since F’(e) = f(e), we thus have 


FlJ] = (d—c)f(e) = fFlJ| < sup fx) J| 


xeJ 


as desired. 


Of course, as you are all aware, one can use the second fundamental theorem 
of calculus to compute integrals relatively easily provided that you can find an 
antiderivative of the integrand f. Note that the first fundamental theorem of calcu- 
lus ensures that every continuous Riemann integrable function has an antiderivative. 
For discontinuous functions, the situation is more complicated and is a graduate- 
level real analysis topic which will not be discussed here. Also, not every function 
with an antiderivative is Riemann integrable; as an example, consider the func- 
tion F: [—1, 1] > R defined by F(x) := x? sin(1/x?) when x #¢ 0, and F(0) := 0. 
Then F is differentiable everywhere (why?), so F’ has an antiderivative, but F’ is 
unbounded (why?), and so is not Riemann integrable. 

We now pause to mention the infamous “+-C” ambiguity in antiderivatives: 


Lemma 11.9.5 Let I be a bounded interval, and let f: I — R be a function. Let 
F: I — Rand G: I > R be two antiderivatives of f. Then there exists a real 
number C such that F(x) = G(x) + C forallx € I. 


Proof See Exercise 11.9.2. 


— Exercises — 


Exercise 11.9.1 Let f: [0, 1] ~ R be the function in Exercise 9.8.5. Show that for every rational 
number q € QN (0, 1), the function F': [0, 1] — R defined by the formula F(x) := So f(y) dy is 
not differentiable at q. 


Exercise 11.9.2. Prove Lemma 11.9.5. (Hint: apply the mean-value theorem, Corollary 10.2.9, 
or Proposition 10.3.3, to the function F — G. One can also prove this lemma using the second 
Fundamental theorem of calculus (how?), but one has to be careful since we do not assume f to be 
Riemann integrable.) 


Exercise 11.9.3, Let a < b be real numbers, and let f: [a,b] — R be a monotone increasing 
function. Let F': [a, b] > R be the function F(x) := Stax) f. Let xo be an element of (a, b). Show 
that F is differentiable at xo if and only if f is continuous at xo. (Hint: one direction is taken care 
of by one of the fundamental theorems of calculus. For the other, consider left and right limits of f 
and argue by contradiction.) 


11.10 Consequences of the Fundamental Theorems 


We can now give a number of useful consequences of the fundamental theorems of 
calculus (beyond the obvious application, that one can now compute any integral for 
which an antiderivative is known). The first application is the familiar integration by 
parts formula. 
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Proposition 11.10.1 (Integration by parts formula) Let J = [a, b], and let F: [a, b] 
— Rand G: [a,b] > R be differentiable functions on [a, b] such that F' and G' 
are Riemann integrable on I. Then we have 


i FG’ = F(b)G(b) — F(a)G(a) — 1 F'G. 
[a,b] 


[a,b] 


Proof See Exercise 11.10.1. 


Next, we show that under certain circumstances, one can write a Riemann-Stieltjes 
integral as a Riemann integral. We begin with piecewise constant functions. 


Theorem 11.10.2 Leta : [a,b] > R be a monotone increasing function, and sup- 
pose that a is also differentiable on [a, b], with a' being Riemann integrable. Let 
Ff: [a,b] > R be a piecewise constant function on [a, b]. Then fa’ is Riemann 
integrable on [a, b], and 


fda= fa’. 


[a,b] [a,b] 


Proof Since f is piecewise constant, it is Riemann integrable, and since a’ is also 
Riemann integrable, then fa’ is Riemann integrable by Theorem 11.4.5. 

Suppose that f is piecewise constant with respect to some partition P of [a, b]; 
without loss of generality we may assume that P does not contain the empty set. 
Then we have 


da = c.f fda = [J] 
[of a= p.c = a ar 


JeP 


where c, is the constant value of f on J. On the other hand, from Theorem 1 1.4.1(h) 
(generalized to partitions of arbitrary length—why is this generalization true?) we 


have 
fal =f fol =D f ese = Des fa 


[4,5] JeP JeP JeP 


But by the second fundamental theorem of calculus (Theorem 11.9.4), f 7 a’ =al[J], 
and the claim follows. 


Corollary 11.10.3 Leta : [a, b] — R be a monotone increasing function, and sup- 
pose that a is also differentiable on [a, b], with a' being Riemann integrable. Let 
ft: [a, b] > R be a function which is Riemann—Stieltjes integrable with respect to 
a on [a, b]. Then fa’ is Riemann integrable on [a, b], and 


/ fda= fa’. 
[a,b] [a,b] 


Proof Note that since f and a’ are bounded, then fa’ must also be bounded. Also, 
since a is monotone increasing and differentable, a’ is non-negative. 
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Let ¢ > 0. Then, we can find a piecewise constant function f majorizing f on 
[a, b], and a piecewise constant function f minorizing f on [a, b], such that 


fda-—e< fda < fda < fda+e. 
[a,b] [a,b] [a,b] [a,b] 


Applying Theorem 11.10.2, we obtain 


fda-e< fa'< fa< fdat+e. 
[a,b] [a,b] [a,b] [a,b] 


Since a’ is non-negative and f minorizes f, then fa’ minorizes fa’. Thus 
"< ' (why?). Thus 
Leal ~~ Le ( y ) 


/ fda-—ex< / fa’. 

[a,b] * Jab) 

/ fa'< i fda+te. 
[a,b] [a,b] 


Since these statements are true for any ¢ > 0, we must have 


Similarly we have 


faa = f fa! =f fa'< f da 
[a,b] Y a,b] [a,b] [a,b] 


and the claim follows. 


Remark 11.10.4 Informally, Corollary 11.10.3 asserts that f da is essentially equiv- 
alent to f {“dxdx, when a is differentiable. However, the advantage of the Riemann— 
Stieltjes integral is that it still makes sense even when a is not differentiable. 


We now build up to the familiar change of variables formula. We first need a 
preliminary lemma. 


Lemma 11.10.5 [Change of variables formula I] Let (a, b| be a closed interval, 
and let $: [a,b] > [¢(a), 6(b)] be a continuous monotone increasing function. 
Let f: [6(a), 6(b)] > R be a piecewise constant function on [¢(a), (b)]. Then 
f o¢: [a, b] > Ris also piecewise constant on [a, b], and 


/ Sepa ‘ 
[a,b] [¢(a),o(d)] 


Proof We give a sketch of the proof, leaving the gaps to be filled in Exercise 11.10.2. 
Let P be a partition of [¢(a), @(b)] such that f is piecewise constant with respect to 
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P; we may assume that P does not contain the empty set. For each J ¢€ P, let c; be 
the constant value of f on J, thus 


[f= Yea. 
[¢(@),@(b)] 


JeP 


For each interval J, let ~!(J) be the set d~!(J) := {x € [a, b] : O(x) € J}. Then 
o@~'(J) is connected (why?) and is thus an interval. Furthermore, c; is the constant 
value of f o 6 on d~'(J) (why?). Thus, if we define Q := {@—!(J) : J € P} (ignor- 
ing the fact that Q has been used to represent the rational numbers), then Q partitions 
[a, b] (why?), and f o @ is piecewise constant with respect to Q (why?). Thus 


food = [ fogdg = glo (I. 
I. (Q\ 28 


JeP 


But ¢[¢7!(J)] = | J| (why?), and the claim follows. 


Proposition 11.10.6 (Change of variables formula II) Let [a, b] be aclosed interval, 
and let $: [a,b] > [¢(a), d(b)] be a continuous monotone increasing function. 
Let f: [6(a), d(b)] — R be a Riemann integrable function on [¢(a), (b)]. Then 
fod: [a,b] — Ris Riemann-Stieltjes integrable with respect to @ on [a, b], and 


fogdp= i f. 
[a,b] [¢(a),o()] 


Proof This will be obtained from Lemma 11.10.5 in a similar manner to how Corol- 
lary 11.10.3 was obtained from Theorem 11.10.2. First observe that since f is Rie- 
mann integrable, it is bounded, and then f o @ must also be bounded (why?). 

Let ¢ > 0. Then, we can find a piecewise constant function f majorizing f on 
[¢(a), 6(b)], and a piecewise constant function f minorizing f on [¢(a), d(d)], 
such that = 


i f-es| ref T<| i +e 
[¢(a),o(b)] [b(a).o(b)] [¢(a),b(b)] [o(a),6()] 


Applying Lemma 11.10.5, we obtain 


/ f-es| foodp < Foods = | ts 
[¢(a),o(b)] [a,b] [a,b] [¢(a),o(b)] 


Since f o @ is piecewise constant and minorizes f o ¢, we have 


[, feoes | foddg 


a,b] Y [a,b] 
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while similarly we have 


/ Fooap> | fopdd. 
[a,b] [a,b] 


Thus 


/ f-es| fod < foods < | fre. 
[o(a),o(b)] Y [a,b] [a,b] [¢(a),6()] 


Since €¢ > 0 was arbitrary, this implies that 


/ f=) f2¢2 foods < [ f 
[¢(a),o(b)] Y fa,b] [a,b] [¢(a).o()] 


and the claim follows. 


Combining this formula with Corollary 11.10.3, one immediately obtains the 
following familiar formula: 


Proposition 11.10.7 (Change of variables formula II) Let [a, b] be a closed inter- 
val, and let @: [a,b] > [¢(a), (b)] be a differentiable monotone increasing func- 
tion such that $' is Riemann integrable. Let f : [p(a), 6(b)] > R be a Riemann 
integrable function on [b(a), 6(b)]. Then (f 0 @)d’: [a, b] > R is Riemann inte- 


grable on [a, b], and 
[ vroow=f[ gs 
[a,b] [¢(a),@(b)] 


— Exercises — 


Exercise 11.10.1 Prove Proposition 11.10.1. (Hint: first use Corollary 11.5.2 and Theorem 11.4.5 
to show that FG’ and F’G are Riemann integrable. Then use the product rule (Theorem 10.1.13(d)).) 


Exercise 11.10.2 Fill in the gaps marked (why?) in the proof of Lemma 11.10.5. 


Exercise 11.10.3 Leta < bbereal numbers, and let f: [a, b] — R be a Riemann integrable func- 
tion. Let g: [—b, —a] — R be defined by g(x) := f(—x). Show that g is also Riemann integrable, 


and ey: = Sia.) f. 


Exercise 11.10.4 What is the analogue of Proposition 11.10.7 when ¢ is monotone decreasing 
instead of monotone increasing? (When ¢ is neither monotone increasing or monotone decreasing, 
the situation becomes significantly more complicated.) 


Appendix A 
The Basics of Mathematical Logic 


The purpose of this appendix is to give a quick introduction to mathematical logic, 
which is the language one uses to conduct rigorous mathematical proofs. Knowing 
how mathematical logic works is also very helpful for understanding the mathemat- 
ical way of thinking, which once mastered allows you to approach mathematical 
concepts and problems in a clear and confident way—including many of the proof- 
type questions in this text. 

Writing logically is a very useful skill. It is somewhat related to, but not the same 
as, writing clearly, or efficiently, or convincingly, or informatively; ideally one would 
want to do all of these at once, but sometimes one has to make compromises, though 
with practice you’ll be able to achieve more of your writing objectives concurrently. 
Thus a logical argument may sometimes look unwieldy, excessively complicated, or 
otherwise appear unconvincing. The big advantage of writing logically, however, is 
that one can be absolutely sure that your conclusion will be correct, as long as all your 
hypotheses were correct and your steps were logical; using other styles of writing 
one can be reasonably convinced that something is true, but there is a difference 
between being convinced and being sure. 

Being logical is not the only desirable trait in writing, and in fact sometimes it 
gets in the way; mathematicians for instance often resort to short informal arguments 
which are not logically rigorous when they want to convince other mathematicians 
of a statement without going through all of the long details, and the same is true of 
course for non-mathematicians as well. So saying that a statement or argument is 
“not logical” is not necessarily a bad thing; there are often many situations when 
one has good reasons not to be emphatic about being logical. However, one should 
be aware of the distinction between logical reasoning and more informal means of 
argument, and not try to pass off an illogical argument as being logically rigorous. 
In particular, if an exercise is asking for a proof, then it is expecting you to be logical 
in your answer. 

Logic is a skill that needs to be learnt like any other, but this skill is also innate 
to all of you—indeed, you probably use the laws of logic unconsciously in your 
everyday speech and in your own internal (non-mathematical) reasoning. However, 
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it does take a bit of training and practice to recognize this innate skill and to apply it to 
abstract situations such as those encountered in mathematical proofs. Because logic 
is innate, the laws of logic that you learn should make sense—if you find yourself 
having to memorize one of the principles or laws of logic here, without feeling a 
mental “click” or comprehending why that law should work, then you will probably 
not be able to use that law of logic correctly and effectively in practice. So, please 
don’t study this appendix the way you might cram before a final—that is going to 
be useless. Instead, put away your highlighter pen, and read and understand this 
appendix rather than merely studying it! 


A.1 Mathematical Statements 


Any mathematical argument proceeds in a sequence of mathematical statements. 
These are precise statements concerning various mathematical objects (numbers, 
vectors, functions, etc.), the operations between them (addition, multiplication, dif- 
ferentiation, etc.), and the relations between them (equality, inequality, etc.). These 
objects can either be constants or variables; more on this later. Statements! are either 
true or false. 


Example A.l.1 2+ 2 = 4 isa true statement; 2 + 2 = 5 is a false statement. 


Not every combination of mathematical symbols is a statement. For instance, 
=2++4=-=2 


is not a statement; we sometimes call it ill-formed or ill-defined. The statements in 
the previous example are well-formed or well-defined. Thus well-formed statements 
can be either true or false; ill-formed statements are considered to be neither true 
nor false (in fact, they are usually not considered statements at all). A more subtle 
example of an ill-formed statement is 


0/0 =1: 


division by zero is undefined, and so the above statement is ill-formed. A logical 
argument should not contain any ill-formed statements, thus for instance if an argu- 
ment uses a statement such as x/y = z, it needs to first ensure that y is not equal to 
zero. Many purported proofs of “O= 1” or other false statements rely on overlooking 
this “statements must be well-formed” criterion. 

Many of you have probably written ill-formed or otherwise inaccurate statements 
in your mathematical work, while intending to mean some other, well-formed and 
accurate statement. To a certain extent this is permissible—it is similar to misspelling 


' More precisely, statements with no free variables are either true or false. We shall discuss free 
variables later on in this appendix. 
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some words in a sentence, or using a slightly inaccurate or ungrammatical word in 
place of a correct one (“She ran good” instead of “She ran well”). In many cases, 
the reader (or grader) can detect this misstep and correct for it. However, it looks 
unprofessional and suggests that you may not know what you are talking about. 
And if indeed you actually do not know what you are talking about, and are applying 
mathematical or logical rules blindly, then writing an ill-formed statement can quickly 
confuse you into writing more and more nonsense—usually of the sort which receives 
no credit in grading. So it is important, especially when just learning a subject, to 
take care in keeping statements well-formed and precise. Once you have more skill 
and confidence, of course you can afford once again to speak loosely, because you 
will know what you are doing and won’t be in as much danger of veering off into 
nonsense. 

One of the basic axioms of mathematical logic is that every well-formed statement 
is either true or false, but not both. (Though if there are free variables, the truth 
of a statement may depend on the values of these variables. More on this later.) 
Furthermore, the truth or falsity of a statement is intrinsic to the statement and does 
not depend on the opinion of the person viewing the statement (as long as all the 
definitions and notations are agreed upon, of course). So to prove that a statement 
is true, it suffices to show that it is not false, while to show that a statement is false, 
it suffices to show that it is not true; this is the principle underlying the powerful 
technique of proof by contradiction, which we discuss later. This axiom is viable as 
long as one is working with precise concepts, for which the truth or falsity can be 
determined (at least in principle) in an objective and consistent manner. However, 
if one is working in very non-mathematical situations, then this axiom becomes 
much more dubious, and so it can be a mistake to apply mathematical logic to 
non-mathematical situations. (For instance, a statement such as “this rock weighs 52 
pounds” is reasonably precise and objective, and so itis fairly safe to use mathematical 
reasoning to manipulate it, whereas vague statements such as “this rock is heavy”, 
“this piece of music is beautiful’, or “God exists” are much more problematic. So 
while mathematical logic is a very useful and powerful tool, it still does have some 
limitations of applicability.) One can still attempt to apply logic (or principles similar 
to logic) in these cases (for instance, by creating a mathematical model of a real-life 
phenomenon), but this is now science or philosophy, not mathematics, and we will 
not discuss it further here. 


Remark A.1.2_ There are other models of logic which attempt to deal with statements 
that are not definitely true or definitely false, such as modal logic, intuitionist logic, 
or fuzzy logic, but these are well beyond the scope of this text. 


Being true is different from being useful or efficient. For instance, the statement 
= 2 
is true but unlikely to be very useful. The statement 


4<4 


268 Appendix A: The Basics of Mathematical Logic 


is also true, but not very efficient (the statement 4 = 4 is more precise). It may also 
be that a statement may be false yet still be useful, for instance 


n= 22/7 


is false, but is still useful as a first approximation. In mathematical reasoning, we 
only concern ourselves with truth rather than usefulness or efficiency; the reason is 
that truth is objective (everybody can agree on it), and we can deduce true statements 
from precise rules, whereas usefulness and efficiency are to some extent matters of 
opinion and do not follow precise rules. Also, even if some of the individual steps in 
an argument may not seem very useful or efficient, it is still possible (indeed, quite 
common) for the final conclusion to be quite non-trivial (i.e., not obviously true) and 
useful. 

Statements are different from expressions. Statements are true or false; expressions 
are a sequence of mathematical symbols which produces some mathematical object 
(a number, matrix, function, set, etc.) as its value. For instance 


2+3%5 
is an expression, not a statement; it produces a number as its value. Meanwhile, 
2+3*5=17 


is a statement, not an expression. Thus it does not make any sense to ask whether 
2+3 x5 is true or false. As with statements, expressions can be well-defined or 
ill-defined; 2 + 3/0, for instance, is ill-defined. More subtle examples of ill-defined 
expressions arise when, for instance, attempting to add a vector to a matrix or eval- 
uating a function outside of its domain, e.g., sin! (2). 

One can make statements out of expressions by using relations such as =, <, 
>, €, C or by using properties (such as “is prime’, “is continuous”, “is invertible’) 
For instance, “30+ 5 is prime” is a statement, as is “30+ 5 < 42 — 7”. Note that 
mathematical statements are allowed to contain English words. 

One can make a compound statement from more primitive statements by using 
logical connectives such as and, or, not, if-then, if-and-only-if. We give some exam- 
ples below, in decreasing order of intuitiveness. 


Conjunction. If X is a statement and Y is a statement, the statement “X and Y” 
is true if X and Y are both true and is false otherwise. For instance, “2+ 2 = 4 
and 3 + 3 = 6” is true, while “2 + 2 = 4 and 3+ 3 = 5” is not. Another example: 
“2+2=4and2+2 = 4” is true, even if it is a bit redundant; logic is concerned 
with truth, not efficiency. 

Due to the expressiveness of the English language, one can reword the statement 
“x and Y” in many ways, e.g., “X and also Y”, or “Both X and Y are true”, etc. 
Interestingly, the statement “X, but Y”’ is logically the same statement as “X and Y”’, 
but they have different connotations (both statements affirm that X and Y are both 
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true, but the first version suggests that X and Y are in contrast to each other, while 
the second version suggests that X and Y support each other). Again, logic is about 
truth, not about connotations or suggestions. 


Disjunction. If X is a statement and Y is a statement, the statement “X or Y” is true 
if either X or Y is true, or both. For instance, “2 + 2 = 4 or 3 + 3 = 5” is true, but 
“2+2=50r3+3=5’ isnot. Also “2+ 2=4o0r3+3 = 6” is true (even if it is 
a bit inefficient; it would be a stronger statement to say “2 + 2 = 4and3 + 3 = 6”). 
Thus by default, the word “or” in mathematical logic defaults to inclusive or. The 
reason we do this is that with inclusive or, to verify “X or Y”, it suffices to verify 
that just one of X or Y is true; we don’t need to show that the other one is false. 
So we know, for instance, that “2 + 2 = 4 or 2353 + 5931 = 7284” is true without 
having to look at the second equation. As in the previous discussion, the statement 
“2+2=40r2+42 = 4’ is true, even if it is highly inefficient. 

If one really does want to use exclusive or, use a statement such as “Either X or 
Y is true, but not both” or “Exactly one of X or Y is true”. Exclusive or does come 
up in mathematics, but nowhere near as often as inclusive or. 


Negation. The statement “X is not true” or “X is false’, or “It is not the case that 
X”, is called the negation of X and is true if and only if X is false, and is false if and 
only if X is true. For instance, the statement “It is not the case that 2 +2 = 5” isa 
true statement. Of course we could abbreviate this statement to “2 + 2 4 5”. 

Negations convert “and” into “or”. For instance, the negation of “Jane Doe has 
black hair and Jane Doe has blue eyes” is “Jane Doe doesn’t have black hair or 
doesn’t have blue eyes”, not “Jane Doe doesn’t have black hair and doesn’t have 
blue eyes” (can you see why?). Similarly, if x is an integer, the negation of “x is 
even and non-negative” is “x is odd or negative’, not “x is odd and negative’. (Note 
how it is important here that or is inclusive rather than exclusive.) Or the negation of 
“x >2and x < 6” (Le., “2 <x <6”) is “x <2 o0rx > 6”, not “x <2 andx > 6” 
or “2<x > 6”. 

Similarly, negations convert “or” into “and”. The negation of “John Doe has 
brown hair or black hair” is “John Doe does not have brown hair and does not have 
black hair”, or equivalently “John Doe has neither brown nor black hair”. If x is 
a real number, the negation of “x > 1 or x < —1” is “x < 1 and x > —-1” (ie., 
—-l<x <1). 

It is quite possible that a negation of a statement will produce a statement which 
could not possibly be true. For instance, if x is an integer, the negation of “‘x is either 
even or odd” is “x is neither even nor odd”, which cannot possibly be true. Remember, 
though, that even if a statement is false, it is still a statement, and it is definitely 
possible to arrive at a true statement using an argument which at times involves false 
statements. (Proofs by contradiction, for instance, fall into this category. Another 
example is proof by dividing into cases. If one divides into three mutually exclusive 
cases, Case 1, Case 2, and Case 3, then at any given time two of the cases will be 
false and only one will be true; however this does not necessarily mean that the proof 
as a whole is incorrect or that the conclusion is false.) 


270 Appendix A: The Basics of Mathematical Logic 


Negations are sometimes unintuitive to work with, especially if there are multiple 
negations; a statement such as “It is not the case that either x is not odd, or x is not 
larger than or equal to 3, but not both” is not particularly pleasant to use. Fortunately, 
one rarely has to work with more than one or two negations at a time, since often 
negations cancel each other. For instance, the negation of “X is not true” is just 
“xX is true”, or more succinctly just “X”. Of course one should be careful when 
negating more complicated expressions because of the switching of “and” and “or”, 
and similar issues. 


If and only if (iff). If X is a statement, and Y is a statement, we say that “X is true if 
and only if Y is true”, whenever X is true, Y has to be also, and whenever Y is true, 
X has to be also (i.e., X and Y are “equally true”). Other ways of saying the same 
thing are “X and Y are logically equivalent statements”, or “X is true iff Y is true”, or 
“X <> Y”. Thus for instance, if x is a real number, then the statement “x = 3 if and 
only if 2x = 6” is true: this means that whenever x = 3 is true, then 2x = 6 is true, 
and whenever 2x = 6 is true, then x = 3 is true. On the other hand, the statement 
“x = 3 if and only if x? = 9” is false; while it is true that whenever x = 3 is true, 
x? = 9 is also true, it is not the case that whenever x2 = 9 is true, that x = 3 is also 
automatically true (think of what happens when x = —3). 

Statements that are equally true are also equally false: if X and Y are logically 
equivalent, and X is false, then Y has to be false also (because if Y were true, then 
X would also have to be true). Conversely, any two statements which are equally 
false will also be logically equivalent. Thus for instance 2 + 2 = 5 if and only if 
4+4= 10. 

Sometimes it is of interest to show that more than two statements are logically 
equivalent; for instance, one might want to assert that three statements X, Y, and Z 
are all logically equivalent. This means whenever one of the statements is true, then 
all of the statements are true; and it also means that if one of the statements is false, 
then all of the statements are false. This may seem like a lot of logical implications to 
prove, but in practice, once one demonstrates enough logical implications between 
X, Y, and Z, one can often conclude all the others and conclude that they are all 
logically equivalent. See for instance Exercises A.1.5, A.1.6. 


— Exercises — 
Exercise A.1.1 What is the negation of the statement “either X is true, or Y is true, but not both’? 


Exercise A.1.2 What is the negation of the statement “X is true if and only if Y is true”? (There 
may be multiple ways to phrase this negation.) 


Exercise A.1.3 Suppose that you have shown that whenever X is true, then Y is true, and whenever 
X is false, then Y is false. Have you now demonstrated that X and Y are logically equivalent? 
Explain. 


Exercise A.1.4_ Suppose that you have shown that whenever X is true, then Y is true, and whenever 
Y is false, then X is false. Have you now demonstrated that X is true if and only if Y is true? Explain. 


Exercise A.1.5 Suppose you know that X is true if and only if Y is true, and you know that Y is 
true if and only if Z is true. Is this enough to show that X, Y, Z are all logically equivalent? Explain. 
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Exercise A.1.6 Suppose you know that whenever X is true, then Y is true; that whenever Y is true, 
then Z is true; and whenever Z is true, then X is true. Is this enough to show that X, Y, Z are all 
logically equivalent? Explain. 


A.2 Implication 


Now we come to the least intuitive of the commonly used logical connectives— 
implication. If X is a statement, and Y is a statement, then “if X, then Y” is the 
implication from X to Y; it is also written “when X is true, Y is true”, or “X implies 
Y” or “Y is true when X is” or “X is true only if Y is true” (this last one takes a 
bit of mental effort to see). What this statement “if X, then Y” means depends on 
whether X is true or false. If X is true, then “if X, then Y” is true when Y is true, 
and false when Y is false. If however X is false, then “if X, then Y” is always true, 
regardless of whether Y is true or false! To put it another way, when X is true, the 
statement “if X, then Y” implies that Y is true. But when X is false, the statement 
“af X, then Y” offers no information about whether Y is true or not; the statement is 
true, but vacuous (i.e., does not convey any new information beyond the fact that the 
hypothesis is false). 


Examples A.2.1 If x is an integer, then the statement “If x = 2, then x? = 4” istrue, 
regardless of whether x is actually equal to 2 or not (though this statement is only 
likely to be useful when x is equal to 2). This statement does not assert that x is equal 
to 2 and does not assert that x” is equal to 4, but it does assert that when and if x is 
equal to 2, then x” is equal to 4. If x is not equal to 2, the statement is still true but 
offers no conclusion on x or x’. 

Some special cases of the above implication: the implication “If 2 = 2, then 
2? = 4” is true (true implies true). The implication “If 3 = 2, then 37 = 4” is true 
(false implies false). The implication “If —2 = 2, then (—2)* = 4” is true (false 
implies true). The latter two implications are considered vacuous—they do not offer 
any new information since their hypothesis is false. (Nevertheless, it is still possible 
to employ vacuous implications to good effect in a proof—a vacously true statement 
is still true. We shall see one such example shortly.) 


As we see, the falsity of the hypothesis does not destroy the truth of an impli- 
cation, in fact it is just the opposite! (When a hypothesis is false, the implication 
is automatically true.) The only way to disprove an implication is to show that the 
hypothesis is true while the conclusion is false. Thus “If 2 + 2 = 4, then4+ 4 = 2” 
is a false implication. (True does not imply false.) 

One can also think of the statement “if X, then Y” as “Y is at least as true as 
X”—if X is true, then Y also has to be true, but if X is false, Y could be as false 
as X, but it could also be true. This should be compared with “X if and only if Y”, 
which asserts that X and Y are equally true. 

Vacuously true implications are often used in ordinary speech, sometimes without 
knowing that the implication is vacuous. A somewhat frivolous example is “If wishes 
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were wings, then pigs would fly”. (The statement “hell freezes over” is also a popular 
choice for a false hypothesis.) A more serious one is “If John had left work at 5 pm, 
then he would be here by now”. This kind of statement is often used in a situation 
in which the conclusion and hypothesis are both false; but the implication is still 
true regardless. This statement, by the way, can be used to illustrate the technique 
of proof by contradiction: if you believe that “If John had left work at 5 pm, then he 
would be here by now”, and you also know that “John is not here by now’, then you 
can conclude that “John did not leave work at 5 pm’, because John leaving work at 
5 pm would lead to a contradiction. Note how a vacuous implication can be used to 
derive a useful truth. 

To summarize, implications are sometimes vacuous, but this is not actually a 
problem in logic, since these implications are still true, and vacuous implications 
can still be useful in logical arguments. In particular, one can safely use statements 
like “If X, then Y” without necessarily having to worry about whether the hypothesis 
X is actually true or not (i.e., whether the implication is vacuous or not). 

Implications can also be true even when there is no causal link between the 
hypothesis and conclusion. The statement “If 1 + 1 = 2, then Washington D.C. is 
the capital of the United States” is true (true implies true), although rather odd; 
the statement “If 2 + 2 = 3, then New York is the capital of the United States” is 
similarly true (false implies false). Of course, such a statement may be unstable (the 
capital of the United States may one day change, while 1 + 1 will always remain 
equal to 2) but it is true, at least for the moment. While it is possible to use a causal 
implications in a logical argument, it is not recommended as it can cause unneeded 
confusion. (Thus, for instance, while it is true that a false statement can be used to 
imply any other statement, true or false, doing so arbitrarily would probably not be 
helpful to the reader.) 

To prove an implication “If X, then Y”, the usual way to do this is to first assume 
that X is true, and use this (together with whatever other facts and hypotheses you 
have) to deduce Y. This is still a valid procedure even if X later turns out to be false; 
the implication does not guarantee anything about the truth of X and only guarantees 
the truth of Y conditionally on X first being true. For instance, the following is a 
valid proof of a true proposition, even though both hypothesis and conclusion of the 
proposition are false: 


Proposition A.2.2 /f2 +2 =5, then4 = 10-4. 


Proof Assume 2 + 2 = 5. Multiplying both sides by 2, we obtain 4 + 4 = 10. Sub- 
tracting 4 from both sides, we obtain 4 = 10 — 4 as desired. 


On the other hand, a common error is to prove an implication by first assuming the 
conclusion and then arriving at the hypothesis. For instance, the following proposition 
is correct, but the proof is not: 


Proposition A.2.3. Suppose that 2x + 3 = 7. Show that x = 2. 


Proof (Incorrect) x = 2; so 2x = 4;so2x +3 =7. 
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When doing proofs, it is important that you are able to distinguish the hypothesis 
from the conclusion; there is a danger of getting hopelessly confused if this distinction 
is not clear. 

Here is a short proof which uses implications which are possibly vacuous. 


Theorem A.2.4 Suppose that n is an integer. Then n(n + 1) is an even integer. 


Proof Since n is an integer, n is even or odd. If n is even, then n(n + 1) is also even, 
since any multiple of an even number is even. If n is odd, then n + 1 is even, which 
again implies that n(n + 1) is even. Thus in either case n(n + 1) is even, and we are 
done. 


Note that this proof relied on two implications: “if n is even, then n(n + 1) is 
even’, and “if n is odd, then n(n + 1) is even’. Since n cannot be both odd and even, 
at least one of these implications has a false hypothesis and is therefore vacuous. 
Nevertheless, both these implications are true, and one needs both of them in order 
to prove the theorem, because we don’t know in advance whether n is even or odd. 
And even if we did, it might not be worth the trouble to check it. For instance, as a 
special case of this theorem we immediately know 


Corollary A.2.5 Let n = (253 + 142) « 123 — (423 + 198)34? + 538 — 213. Then 
n(n + 1) is an even integer. 


In this particular case, one can work out exactly which parity n is —even or odd — 
and then use only one of the two implications in the above theorem, discarding the 
vacuous one. This may seem like it is more efficient, but it is a false economy, because 
one then has to determine what parity is, and this requires a bit of effort—more 
effort than it would take if we had just left both implications, including the vacuous 
one, in the argument. So, somewhat paradoxically, the inclusion of vacuous, false, 
or otherwise “useless” statements in an argument can actually save you effort in the 
long run! (I’m not suggesting, of course, that you ought to pack your proofs with lots 
of time-wasting and irrelevant statements; all I’m saying here is that you need not 
be unduly concerned that some hypotheses in your argument might not be correct, 
as long as your argument is still structured to give the correct conclusion regardless 
of whether those hypotheses were true or false.) 

The statement “If X, then Y” is not the same as “If Y, then X”’; for instance, while 
“If x = 2, then x? = 4” is true, “If x? = 4, then x = 2” can be false if x is equal to 
—2. These two statements are called converses of each other; thus the converse of 
a true implication is not necessarily another true implication. We use the statement 
“xX if and only if Y” to denote the statement that “If X, then Y; and if Y, then X”. 
Thus for instance, we can say that x = 2 if and only if 2x = 4, because if x = 2 then 
2x = 4, while if 2x = 4 then x = 2. One way of thinking about an if-and-only-if 
statement is to view “X if and only if Y” as saying that X is just as true as Y; if one 
is true then so is the other, and if one is false, then so is the other. For instance, the 
statement “If 3 = 2, then 6 = 4” is true, since both hypothesis and conclusion are 
false. (Under this view, “If X, then Y” can be viewed as a statement that Y is at least 
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as true as X.) Thus one could say “X and Y are equally true” instead of “X if and 
only if Y”. 

Similarly, the statement “If X is true, then Y is true” is not the same as “If X is 
false, then Y is false”. Saying that “if x = 2, then x? = 4” does not imply that “if 
x #2, then x? 4 4”, and indeed we have x = —2 as a counterexample in this case. 
If-then statements are not the same as if-and-only-if statements. (If we knew that “X 
is true if and only if Y is true”, then we would also know that “X is false if and only 
if Y is false’’.) The statement “If X is false, then Y is false” is sometimes called the 
inverse of “If X is true, then Y is true”; thus the inverse of a true implication is not 
necessarily a true implication. 

If you know that “Tf X is true, then Y is true’, then it is also true that “If Y is false, 
then X is false” (because if Y is false, then X can’t be true, since that would imply Y 
is true, a contradiction). For instance, if we knew that “If x = 2, then x? = 4”, then 
we also know that “If x? 4 4, then x 4 2”. Or if we knew “If John had left work at 
5 pm, he would be here by now”, then we also know “If John isn’t here now, then he 
could not have left work at 5 pm”. The statement “If Y is false, then X is false” is 
known as the contrapositive of “If X, then Y”’, and both statements are equally true. 

In particular, if you know that X implies something which is known to be false, 
then X itself must be false. This is the idea behind proof by contradiction or reductio 
ad absurdum: to show something must be false, assume first that it is true, and show 
that this implies something which you know to be false (e.g., that a statement is 
simultaneously true and not true). For instance: 


Proposition A.2.6 Suppose that x be a positive number such that sin(x) = 1. Then 
x >m/2. 


Proof Suppose for sake of contradiction that x < 2/2. Since x is positive, we thus 
have 0 < x < 2/2. Since sin(x) is increasing for O < x < 2/2, and sin(O) = 0 and 
sin(z/2) = 1, we thus have 0 < sin(x) < 1. But this contradicts the hypothesis that 
sin(x) = 1. Hence x > 7/2. 


Note that one feature of proof by contradiction is that at some point in the proof 
you assume a hypothesis (in this case, that x < 2/2) which later turns out to be false. 
Note however that this does not alter the fact that the argument remains valid, and 
that the conclusion is true; this is because the ultimate conclusion does not rely on 
that hypothesis being true (indeed, it relies instead on it being false!). 

Proof by contradiction is particularly useful for showing “negative” statements— 
that X is false, that a is not equal to b, that kind of thing. But the line between 
positive and negative statements is sort of blurry. (Is the statement x > 2 a positive 
or negative statement? What about its negation, that x < 2?) So this is not a hard and 
fast rule. 

Logicians often use special symbols to denote logical connectives; for instance “X 
implies Y” canbe written“X = > Y”,“X isnottrue” canbe written “~ X”,“!X”, or 
“aX”, “x and Y” can be written “X A Y” or “X&Y”, and so forth. But for general- 
purpose mathematics, these symbols are not often used; English words are often 
more readable and don’t take up much more space. Also, using these symbols tends 
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to blur the line between expressions and statements; it’s not as easy to understand 
“(x =3)AQ =5)) = (+y=8)’ as “Ifx =3 and y =5,thenx + y= 8”. 
So in general I would not recommend using these symbols (except possibly for => , 
which is a very intuitive symbol). 


A.3 The Structure of Proofs 


To prove a statement, one often starts by assuming the hypothesis and working one’s 
way toward a conclusion; this is the direct approach to proving a statement. Such a 
proof might look something like the following: 


Proposition A.3.1 A implies B. 


Proof Assume A is true. Since A is true, C is true. Since C is true, D is true. Since 
D is true, B is true, as desired. 


An example of such a direct approach is 
Proposition A.3.2 [fx =z, then sin(x/2) + 1 = 2. 


Proof Let x = 7. Since x = 7, we have x/2 = 1/2. Since x/2 = 2/2, we have 
sin(x/2) = 1. Since sin(x/2) = 1, we have sin(x/2) + 1 = 2. 


In the above proof, we started at the hypothesis and moved steadily from there 
toward a conclusion. It is also possible to work backward from the conclusion and 
seeing what it would take to imply it. For instance, a typical proof of Proposition 
A.3.1 of this sort might look like the following: 


Proof To show B, it would suffice to show D. Since C implies D, we just need to 
show C. But C follows from A. 


As an example of this, we give another proof of Proposition A.3.2: 


Proof To show sin(x/2) + 1 = 2, it would suffice to show that sin(x/2) = 1. Since 
x/2 = 1/2 would imply sin(x/2) = 1, we just need to show that x/2 = 1/2. But 
this follows since x = z. 


Logically speaking, the above two proofs of Proposition A.3.2 are the same, 
just arranged differently. Note how this proof style is different from the (incorrect) 
approach of starting with the conclusion and seeing what it would imply (as in 
Proposition A.2.3); instead, we start with the conclusion and see what would imply 
it. 

Another example of a proof written in this backward style is the following: 


Proposition A.3.3 Let 0 <r < 1 be a real number. Then the series Y~°_, nr” is 
convergent. 
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Proof To show this series is convergent, it suffices by the ratio test to show that the 
ratio 


“et n+1 
=P7f 
n 


r™n 


converges to something less than | as n — ov. Since r is already less than 1, it will 
be enough to show that mt converges to |. But since ntl =1+ 1, it suffices to 
show that 1 — 0. But this is clear since n > oo. 


One could also do any combination of moving forward from the hypothesis and 
backward from the conclusion. For instance, the following would be a valid proof of 
Proposition A.3.1: 


Proof To show B, it would suffice to show D. So now let us show D. Since we have 
A by hypothesis, we have C. Since C implies D, we thus have D as desired. 


Again, from a logical point of view this is exactly the same proof as before. Thus 
there are many ways to write the same proof down; how you do so is up to you, but 
certain ways of writing proofs are more readable and natural than others, and different 
arrangements tend to emphasize different parts of the argument. (Of course, when 
you are just starting out doing mathematical proofs, you’re generally happy to get 
some proof of a result and don’t care so much about getting the “best” arrangement 
of that proof; but the point here is that a proof can take many different forms.) 

The above proofs were pretty simple because there was just one hypothesis and 
one conclusion. When there are multiple hypotheses and conclusions, and the proof 
splits into cases, then proofs can get more complicated. For instance a proof might 
look as tortuous as this: 


Proposition A.3.4 Suppose that A and B are true. Then C and D are true. 


Proof Since A is true, E is true. From E and B we know that F is true. Also, in 
light of A, to show D it suffices to show G. There are now two cases: H and J. If H 
is true, then from F and H we obtain C, and from A and H we obtain G. If instead 
I is true, then from J we have G, and from J and G we obtain C. Thus in both cases 
we obtain both C and G, and hence C and D. 


Incidentally, the above proof could be rearranged into a much tidier manner, 
but you at least get the idea of how complicated a proof could become. To show 
an implication there are several ways to proceed: you can work forward from the 
hypothesis; you can work backward from the conclusion; or you can divide into cases 
in the hope to split the problem into several easier subproblems. Another is to argue 
by contradiction, for instance you can have an argument of the form 


Proposition A.3.5 Suppose that A is true. Then B is false. 


Proof Suppose for sake of contradiction that B is true. This would imply that C is 
true. But since A is true, this implies that D is true; which contradicts C. Thus B 
must be false. 
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As you can see, there are several things to try when attempting a proof. With 
experience, it will become clearer which approaches are likely to work easily, which 
ones will probably work but require much effort, and which ones are probably going 
to fail. In many cases there is really only one obvious way to proceed. Of course, 
there may definitely be multiple ways to approach a problem, so if you see more than 
one way to begin a problem, you can just try whichever one looks the easiest, but be 
prepared to switch to another approach if it begins to look hopeless. 

Also, it helps when doing a proof to keep track of which statements are known 
(either as hypotheses, or deduced from the hypotheses, or coming from other theo- 
rems and results) and which statements are desired (either the conclusion, or some- 
thing which would imply the conclusion, or some intermediate claim or lemma which 
will be useful in eventually obtaining the conclusion). Mixing the two up is almost 
always a bad idea and can lead to one getting hopelessly lost in a proof. 


A.4 Variables and Quantifiers 


One can get quite far in logic just by starting with primitive statements (such as “2 + 
2 = 4” or “John has black hair’), then forming compound statements using logical 
connectives, and then using various laws of logic to pass from one’s hypotheses 
to one’s conclusions; this is known as propositional logic or Boolean logic. (It is 
possible to list a dozen or so such laws of propositional logic, which are sufficient 
to do everything one wants to do, but I have deliberately chosen not to do so here, 
because you might then be tempted to memorize that list, and that is not how one 
should learn how to do logic, unless one happens to be a computer or some other 
non-thinking device. However, if you really are curious as to what the formal laws 
of logic are, look up “laws of propositional logic” or something similar in the library 
or on the internet.) 

However, to do mathematics, this level of logic is insufficient, because it does not 
incorporate the fundamental concept of variables—those familiar symbols such as 
x orn which denote various quantities which are unknown, or set to some value, or 
assumed to obey some property. Indeed we have already sneaked in some of these 
variables in order to illustrate some of the concepts in propositional logic (mainly 
because it gets boring after a while to talk endlessly about variable-free statements 
such as 2 + 2 = 4 or “Jane has black hair”). Mathematical logic is thus the same as 
propositional logic but with the additional ingredient of variables added. 

A variable is asymbol, such as n or x, which denotes a certain type of mathematical 
object—an integer, a vector, a matrix, that kind of thing. In almost all circumstances, 
the type of object that the variable represents should be declared, otherwise it will be 
difficult to make well-formed statements using it. (There are very few true statements 
that one can make about variables without knowing the type of variables involved. 
For instance, given a variable x of any type whatsoever, it is true that x = x, and if 
we also know that x = y, then we can conclude that y = x. But one cannot say, for 
instance, that x + y = y +x, until we know what type of objects x and y are and 
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whether they support the operation of addition; for instance, the above statement is 
ill-formed if x is a matrix and y is a vector. Thus if one actually wants to do some 
useful mathematics, then every variable should have an explicit type.) 

One can form expressions and statements involving variables, for instance, if x is 
a real variable (i.e., a variable which is a real number), x + 3 is an expression, and 
x + 3 = 5 is a statement. But now the truth of a statement may depend on the value 
of the variables involved; for instance the statement x + 3 = 5 is true if x is equal to 
2, but is false if x is not equal to 2. Thus the truth of a statement involving a variable 
may depend on the context of the statement—in this case, it depends on what x is 
supposed to be. (This is a modification of the rule for propositional logic, in which 
all statements have a definite truth value.) 

Sometimes we do not set a variable to be anything (other than specifying its type). 
Thus, we could consider the statement x + 3 = 5 where x is an unspecified real 
number. In such a case we call this variable a free variable; thus we are considering 
x +3 =5 with x a free variable. Statements with free variables might not have a 
definite truth value, as they depend on an unspecified variable. For instance, we have 
already remarked that x + 3 = 5 does not have a definite truth value if x is a free 
real variable, though of course for each given value of x the statement is either true 
or false. On the other hand, the statement (x + 1)? = x? + 2x + 1 is true for every 
real number x, and so we can regard this as a true statement even when x is a free 
variable. 

At other times, we set a variable to equal a fixed value, by using a statement 
such as “Let x = 2” or “Set x equal to 2”. In this case, the variable is known as a 
bound variable, and statements involving only bound variables and no free variables 
do have a definite truth value. For instance, if we set x = 342, then the statement 
“x + 135 = 477” now has a definite truth value, whereas if x is a free real variable 
then “x + 135 = 477” could be either true or false, depending on what x is. Thus, 
as we have said before, the truth of a statement such as “x + 135 = 477” depends 
on the context—whether x is free or bound, and if it is bound, what it is bound to. 

One can also turn a free variable into a bound variable by using the quantifiers 
“for all” or “for some’’. For instance, the statement 


(1? S274 2841 
is a Statement with one free variable x and need not have a definite truth value, but 
the statement 


(x + i = x? + 2x + 1 for all real numbers x 


is a statement with one bound variable x and now has a definite truth value (in this 
case, the statement is true). Similarly, the statement 


x+3=5 


has one free variable and does not have a definite truth value, but the statement 
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x + 3 = 5 for some real number x 
is true, since it is true for x = 2. On the other hand, the statement 
x + 3 = 5 for all real numbers x 


is false, because there are some (indeed, there are many) real numbers x for which 
x + 3 is not equal to 5. 


Universal quantifiers. Let P(x) be some statement depending on a free variable 
x. The statement “P(x) is true for all x of type T” means that given any x of type 
T, the statement P(x) is true regardless of what the exact value of x is. In other 
words, the statement is the same as saying “if x is of type 7, then P(x) is true”. 
Thus the usual way to prove such a statement is to let x be a free variable of type 
T (by saying something like “Let x be any object of type T”’), and then proving 
P(x) for that object. The statement becomes false if one can produce even a single 
counterexample, i.e., an element x which lies in T but for which P(x) is false. For 
instance, the statement “x? is greater than x for all positive x” can be shown to be 
false by producing a single example, such as x = 1 or x = 1/2, where x? is not 
greater than x. 

On the other hand, producing a single example where P(x) is true will not show 
that P(x) is true for all x. For instance, just because the equation x + 3 = 5 has a 
solution when x = 2 does not imply that x + 3 = 5 for all real numbers x; it only 
shows that x + 3 = 5 is true for some real number x. (This is the source of the often- 
quoted, though somewhat inaccurate, slogan “One cannot prove a statement just by 
giving an example”. The more precise statement is that one cannot prove a “for all” 
statement by examples, though one can certainly prove “for some” statements this 
way, and one can also disprove “for all” statements by a single counterexample.) 

It occasionally happens that there are in fact no variables x of type T. In that case 
the statement “P(x) is true for all x of type T” is vacuously true—it is true but has 
no content, similar to a vacuous implication. For instance, the statement 


6 < 2x <4forall3 <x <2 


is true, and easily proven, but is vacuous. (Such a vacuously true statement can still 
be useful in an argument, though this doesn’t happen very often.) 

One can use phrases such as “For every” or “For each” instead of “For all’, e.g., 
one can rephrase “(x + 1)? = x2 +2x +1 for all real numbers x” as “For each 
real number x, (x + 1)? is equal to x? + 2x + 1”. For the purposes of logic these 
rephrasings are equivalent. The symbol V can be used instead of “For all’, thus 
for instance “Vx € X : P(x) is true” or “P(x) is true Vx € X” is synonymous with 
“P(x) is true for all x € X”. 


Existential quantifiers. The statement “‘P (x) is true for some x of type T” means 
that there exists at least one x of type T for which P(x) is true, although it may be 
that there is more than one such x. (One would use a quantifier such as “for exactly 
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one x” instead of “for some x” if one wanted both existence and uniqueness of such 
an x.) To prove such a statement it suffices to provide a single example of such an x. 
For instance, to show that 


x? + 2x — 8 = 0 for some real number x 


all one needs to do is find a single real number x for which x? + 2x — 8 = 0, for 
instance x = 2 will do. (One could also use x = —4, but one doesn’t need to use 
both.) Note that one has the freedom to select x to be anything one wants when 
proving a for some statement; this is in contrast to proving a for all statement, where 
one has to let x be arbitrary. (One can compare the two statements by thinking of 
two games between you and an opponent. In the first game, the opponent gets to pick 
what x is, and then you have to prove P(x); if you can always win this game, then 
you have proven that P(x) is true for all x. In the second game, you get to choose 
what x is, and then you prove P(x); if you can win this game, you have proven that 
P(x) is true for some x.) 

Usually, saying something is true for all x is much stronger than just saying it is 
true for some x. There is one exception though, if the condition on x is impossible 
to satisfy, then the for all statement is vacuously true, but the for some statement is 
false. For instance 

6 < 2x <4forall3 <x <2 


is true, but 
6 < 2x < 4forsome3 < x <2 


is false. 

One can use phrases such as “For at least one” or “There exists ...such that” 
instead of “For some”’. For instance, one can rephrase “x*~ + 2x — 8 = 0 forsome real 
number x” as “There exists a real number x such that x7 + 2x — 8 = 0”. The symbol 
4 can be used instead of “There exists ...such that’, thus for instance “Ax € X : P(x) 
is true” is synonymous with “P(x) is true for some x € X”. 


A.5 Nested Quantifiers 


One can nest two or more quantifiers together. For instance, consider the statement 


For every positive number x, there exists a 


positive number y such that y* = x. 


What does this statement mean? It means that for each positive number x, the 
statement 
There exists a positive number y such that y” = x 
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is true. In other words, one can find a positive square root of x for each positive number 
x. So the statement is saying that every positive number has a positive square root. 

To continue the gaming metaphor, suppose you play a game where your opponent 
first picks a positive number x, and then you pick a positive number y. You win the 
game if y? = x. If you can always win the game regardless of what your opponent 
does, then you have proven that for every positive x, there exists a positive y such 
that y* = x. 

Negating a universal statement produces an existential statement. The negation 
of “All swans are white” is not “All swans are not white’, but rather “There is some 
swan which is not white”. Similarly, the negation of “For every 0 < x < 1/2, we 
have cos(x) > 0” is “For some 0 < x < 2/2, we have cos(x) < 0, not “For every 
0 <x < 7/2, we have cos(x) < 0”. 

Negating an existential statement produces a universal statement. The negation 
of “There exists a black swan” is not “There exists a swan which is non-black’’, 
but rather “All swans are non-black’’. Similarly, the negation of “There exists a real 
number x such that x? + x + 1 = 0” is “For every real number x, x7 +x +140”, 
not “There exists a real number x such that x? + x + 1 4 0”. (The situation here is 
very similar to how “and” and “or” behave with respect to negations.) 

If you know that a statement P(x) is true for all x, then you can set x to be 
anything you want, and P(x) will be true for that value of x; this is what “for all” 
means. Thus for instance if you know that 


(x+ 1 = x* + 2x + 1 for all real numbers x; 
then you can conclude for instance that 
(x +1)? =n? +27 +1, 
or for instance that 
(cos(y) + iS? = cos(y)? + 2cos(y) + 1 for all real numbers y 
(because if y is real, then cos(y) is also real), and so forth. Thus universal statements 
are very versatile in their applicability—you can get P(x) to hold for whatever x 
you wish. Existential statements, by contrast, are more limited; if you know that 
x? + 2x — 8 = 0 for some real number x 
then you cannot simply substitute in any real number you wish, e.g., 2, and conclude 
that 2? + 27 — 8 = 0. However, you can of course still conclude that x? + 2x — 8 = 
0 for some real number x, it’s just that you don’t get to pick which x it is. (To continue 


the gaming metaphor, you can make P(x) hold, but your opponent gets to pick x for 
you, you don’t get to choose for yourself.) 
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Remark A.5.1 In the history of logic, quantifiers were formally studied thousands 
of years before Boolean logic was. Indeed, Aristotlean logic, developed of course by 
Aristotle (384BC — 322BC) and his school, deals with objects, their properties, and 
quantifiers such as “for all” and “for some”. A typical line of reasoning (or syllogism) 
in Aristotlean logic goes like this: “All men are mortal. Socrates is a man. Hence, 
Socrates is mortal”. Aristotlean logic is a subset of mathematical logic, but is not 
as expressive because it lacks the concept of logical connectives such as and, or, or 
if-then (although “not” is allowed) and also lacks the concept of a binary relation 
such as = or <. 


Swapping the order of two quantifiers may or may not make a difference to the 
truth of a statement. Swapping two “for all” quantifiers is harmless: a statement such 
as 

For all real numbers a, and for all real numbers b, 


we have (a + b)* = a” + 2ab + b* 
is logically equivalent to the statement 


For all real numbers b, and for all real numbers a, 


we have (a + b)* = a” + 2ab+ b* 


(why? The reason has nothing to do with whether the identity (a + b)? = a? + 2ab + 
b* is actually true or not). Similarly, swapping two “there exists” quantifiers has no 


effect: 
There exists a real number a, and there exists a real number b, 


such that a? + b* = 0 
is logically equivalent to 


There exists a real number b, and there exists a real number a, 


such that a? + b* = 0. 


However, swapping a “for all” with a “there exists” makes a lot of difference. 
Consider the following two statements: 


(a) For every integer n, there exists an integer m which is larger 
than n. 
(b) There exists an integer m such that m is larger than n for every integer n. 


Statement (a) is obviously true: if your opponent hands you an integer n, you 
can always find an integer m which is larger than n. But Statement (b) is false: if 
you choose m first, then you cannot ensure that m is larger than every integer n; 
your opponent can easily pick a number 7 bigger than m to defeat that. The crucial 
difference between the two statements is that in Statement (a), the integer n was 
chosen first, and integer m could then be chosen in a manner depending on n; but in 
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Statement (b), one was forced to choose m first, without knowing in advance what 
n is going to be. In short, the reason why the order of quantifiers is important is that 
the inner variables may possibly depend on the outer variables, but not vice versa. 


— Exercises — 


Exercise A.5.1 What does each of the following statements mean, and which of them are true? Can 
you find gaming metaphors for each of these statements? 


(a) For every positive number x, and every positive number y, we have y? = x. 

(b) There exists a positive number x such that for every positive number y, we have y? = x. 
(c) There exists a positive number x, and there exists a positive number y, such that y? = x. 
(d) For every positive number y, there exists a positive number x such that y? = x. 


(e) There exists a positive number y such that for every positive number x, we have y? 


=X. 


A.6 Some Examples of Proofs and Quantifiers 


Here we give some simple examples of proofs involving the “for all” and “there 
exists” quantifiers. The results themselves are simple, but you should pay attention 
instead to how the quantifiers are arranged and how the proofs are structured. 


Proposition A.6.1 For every ¢ > 0 there exists a6 > 0 such that 26 < e. 


Proof Let ¢ > 0 be arbitrary. We have to show that there exists a 6 > 0 such that 
25 < e. We only need to pick one such 4; choosing 6 := ¢/3 will work, since one 
then has 25 = 2¢/3 < «. 


Notice how « has to be arbitrary, because we are proving something for every ¢; 
on the other hand, 6 can be chosen as you wish, because you only need to show that 
there exists a 5 which does what you want. Note also that 6 can depend on e, because 
the 5-quantifier is nested inside the ¢-quantifier. If the quantifiers were reversed, i.e., 
if you were asked to prove “There exists ad > 0 such that for every ¢ > 0,25 < &”, 
then you would have to select 6 first before being given e. In this case it is impossible 
to prove the statement, because it is false (why?). 

Normally, when one has to prove a “There exists...” statement, e.g., “Prove that 
there exists an e > O such that X is true”, one proceeds by selecting ¢ carefully, and 
then showing that X is true for that e. However, this sometimes requires a lot of 
foresight, and it is legitimate to defer the selection of ¢ until later in the argument, 
when it becomes clearer what properties ¢ needs to satisfy. The only thing to watch 
out for is to make sure that ¢ does not depend on any of the bound variables nested 
inside X. For instance: 


Proposition A.6.2. There exists an ¢ > 0 such that sin(x) > x/2 forall0 <x <e. 


Proof We pick ¢ > 0 to be chosen later, and let 0 < x < e. Since the derivative of 
sin(x) is cos(x), we see from the mean-value theorem (Corollary 10.2.9) we have 
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sin(x) i sin(x) — sin(Q) = bes) 


x x—0 


for some 0 < y < x. Thus in order to ensure that sin(x) > x/2, it would suffice to 
ensure that cos(y) > 1/2. To do this, it would suffice to ensure that 0 < y < 1/3 
(since the cosine function takes the value of | at 0, takes the value of 1/2 at 7/3, and 
is decreasing in between). Since 0 < y < x and0 < x < €, we see thatO0 < y <e. 
Thus if we pick ¢ := 7/3, then we have 0 < y < 7/3 as desired, and so we can 
ensure that sin(x) > x/2 forall0 <x <e. 


Note that the value of ¢ that we picked at the end did not depend on the nested vari- 
ables x and y. This makes the above argument legitimate. Indeed, we can rearrange 
it so that we don’t have to postpone anything: 


Proof We choose ¢ := 1/3; clearly ¢ > 0. Now we have to show that for all 0 < 
x < 1/3, wehave sin(x) > x/2.SoletO0 < x < 2/3 be arbitrary. By the mean-value 


theorem we have 
sin(x) sin(x) — sin(Q) 
= = cos(y) 


x x—0 


for some 0 < y < x. Since 0 < y < x and 0 < x < 7/3, we have 0 < y < 7/3. 
Thus cos(y) > cos(z/3) = 1/2, since cos is decreasing on the interval [0, 2/3]. 
Thus we have sin(x)/x > 1/2 and hence sin(x) > x /2 as desired. 


If we had chosen ¢ to depend on x and y then the argument would not be valid, 
because é is the outer variable and x, y are nested inside it. 


A.7 Equality 


As mentioned before, one can create statements by starting with expressions (such 
as 2 x 3+ 5) and then asking whether an expression obeys a certain property, or 
whether two expressions are related by some sort of relation (=, <, €, etc.). There 
are many relations, but the most important one is equality, and it is worth spending 
a little time reviewing this concept. 

Equality is arelation linking two objects x, y of the same type T (e.g., two integers, 
or two matrices, or two vectors, etc.). Given two such objects x and y, the statement 
xX = y may or may not be true; it depends on the value of x and y and also on how 
equality is defined for the class of objects under consideration. For instance, as real 
numbers, the two numbers 0.9999... and 1 are equal. In modular arithmetic with 
modulus 10 (Gin which numbers are considered equal to their remainders modulo 10), 
the numbers 12 and 2 are considered equal, 12 = 2, even though this is not the case 
in ordinary arithmetic. 

How equality is defined depends on the class T of objects under consideration, 
and to some extent is just a matter of definition. However, for the purposes of logic 
we require that equality obeys the following four axioms of equality: 
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(Reflexive axiom). Given any object x, we have x = x. 

(Symmetry axiom). Given any two objects x and y of the same type, if x = y, then 

y=x. 

e (Transitive axiom). Given any three objects x, y, z of the same type, if x = y and 
y = z, then x = z. 

e (Substitution axiom). Given any two objects x and y of the same type, if x = y, 

then f(x) = f(y) for all functions or operations f. Similarly, for any property 

P(x) depending on x, if x = y, then P(x) and P(y) are equivalent statements. 


The first three axioms are clear; together, they assert that equality is an equivalence 
relation. To illustrate the substitution we give some examples. 


Example A.7.1 Let x and y be real numbers. If x = y, then 2x = 2y, and sin(x) = 
sin(y). Furthermore, x + z = y + z for any real number z. 


Example A.7.2. Let n and m be integers. If n is odd and n = m, then m must also be 
odd. If we have a third integer k, and we know that n > k andn = m, then we also 
know that m > k. 


Example A.7.3 Let x, y, z be real numbers. If we know that x = sin(y) and y = z?, 
then (by the first form of the substitution axiom) we have sin(y) = sin (z*), and hence 
(by the transitive axiom) we have x = sin(z”). One can also obtain the conclusion 
x = sin(z*) more directly by using the second form of the substitution axiom. 


Thus, from the point of view of logic, we can define equality on a class of objects 
however we please, so long as it obeys the reflexive, symmetry, and transitive axioms, 
and is consistent with all other operations on the class of objects under discussion in 
the sense that the substitution axiom was true for all of those operations. For instance, 
if we decided one day to modify the integers so that 12 was now equal to 2, one could 
only do so if one also made sure that 2 was now equal to 12, and that f(2) = f(12) 
for any operation f on these modified integers. For instance, we now need 2 + 5 to 
be equal to 12 + 5. (In this case, pursuing this line of reasoning will eventually lead 
to modular arithmetic with modulus 10.) 

For most applications in analysis, one should not need to compare objects of 
different types: for instance, if x is a set, and y is a number, then one should not need 
to consider the question of whether x = y is true or false. But for the purposes of 
doing set theory, it is convenient to adopt the convention that the statement x = y is 
automatically false if x, y are of different types; for instance, if one is treating natural 
numbers and vectors as objects of different types, then a natural number would not be 
equal to a vector. But sometimes we override this convention by identifying objects 
of one type with some objects of another type, e.g., when we identified natural 
numbers with their counterparts in the integers, or integers with their counterparts 
in the rationals, and so forth. This is technically an “abuse of notation’, but can be 
tolerated as long as one verifies that no violation of the axioms of equality occur by 
doing so. We will sometimes use the notation x = y to indicate that a mathematical 
object x is being identified with a mathematical object y. 
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— Exercises — 


Exercise A.7.1 Suppose you have four real numbers a, b, c, d and you know that a = bandc = d. 
Use the above four axioms to deduce thata +d =b-+c. 


Appendix B 
The Decimal System 


In Chaps. 2, 4, and 5 we painstakingly constructed the basic number systems of 
mathematics: the natural numbers, integers, rationals, and reals. Natural numbers 
were simply postulated to exist, and to obey five axioms; the integers then came via 
(formal) differences of the natural numbers; the rationals then came from (formal) 
quotients of the integers; and the reals then came from (formal) limits of the rationals. 

This is all very well and good, but it does seem somewhat alien to one’s prior 
experience with these numbers. In particular, very little use was made of the decimal 
system, in which the digits 0, 1, 2,3, 4,5, 6, 7, 8, 9 are combined to represent these 
numbers. Indeed, except for a number of examples which were not essential to the 
main construction, the only decimals we really used were the numbers 0, 1, and 2, 
and the latter two can be rewritten as 0+4+ and (0++)+4+. 

The basic reason for this is that the decimal system itself is not essential to math- 
ematics. It is very convenient for computations, and we have grown accustomed to 
it thanks to a thousand years of use, but in the history of mathematics it is actually a 
comparatively recent invention. Numbers have been around for about ten thousand 
years (starting from scratch marks on cave walls), but the modern Hindu-Arabic 
base 10 system for representing numbers only dates from the eleventh century or so. 
Some early civilizations relied on other bases; for instance the Babylonians used a 
base 60 system (which still survives in our time system of hours, minutes, and sec- 
onds, and in our angular system of degrees, minutes, and seconds). And the ancient 
Greeks were able to do quite advanced mathematics, despite the fact that the most 
advanced number representation system available to them was the Roman numeral 
system J, 77, [JI, 1V,..., which was horrendous for computations of even two- 
digit numbers. And of course modern computing relies on binary, hexadecimal, or 
byte-based (base 256) arithmetic instead of decimal, while analog computers such 
as the slide rule do not really rely on any number representation system at all. In fact, 
now that computers can do the menial work of number-crunching, there is very little 
use for decimals in modern mathematics. Indeed, we rarely use any numbers other 
than one-digit numbers or one-digit fractions (as well as e, 2, i) explicitly in modern 
mathematical work; any more complicated numbers usually get called more generic 
names such as n. 
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Nevertheless, the subject of decimals does deserve an appendix, because it is so 
integral to the way we use mathematics in our everyday life, and also because we do 
want to use such notation as 3.14159... to refer to real numbers, as opposed to the 
far clunkier “LIM,-,.5 a), Where a; = 3.1, ad) := 3.14, a3 :=3.141,...”. 

We begin by reviewing how the decimal system works for the positive integers 
and then turn to the reals. Note that in this discussion we shall freely use all the 
results from earlier chapters. 


B.1. The Decimal Representation of Natural Numbers 


In this section we will avoid the usual convention of abbreviating a x b as ab, since 
this would mean that decimals such as 34 might be misconstrued as 3 x 4. 


Definition B.1.1 (Digits) A digit is any one of the ten symbols 0, 1, 2,3,..., 9. We 
equate these digits with natural numbers by the formulae 0 = 0, 1 =0+4+,2 = 1H, 
etc. all the way up to 9 = 8++. We also define the number ten by the formula 
ten := 9+++. (We cannot use the decimal notation 10 to denote ten yet, because that 
presumes knowledge of the decimal system and would be circular.) 


Definition B.1.2 (Positive integer decimals) A positive integer decimal is any string 
Andn—1 ... ao Of digits, where n > 0 is a natural number, and the first digit a, is non- 
zero. Thus, for instance, 3049 is a positive integer decimal, but 0493 or 0 is not. We 
equate each positive integer decimal with a positive integer by the formula 


n 


AnAn—|...dy = y a; X ten’. 
i=0 


Remark B.1.3' Note in particular that this definition implies that 
10 =0 x ten? +1 x ten! = ten 


and thus we can write ten as the more familiar 10. Also, a single-digit integer decimal 
is exactly equal to that digit itself, e.g., the decimal 3 by the above definition is equal 
to 

3 =3 x ten? =3 


so there is no possibility of confusion between a single digit and a single digit decimal. 
(This is a subtle distinction, and not one which is worth losing much sleep over.) 


Now we show that this decimal system indeed represents the positive integers. It 
is clear from the definition that every positive decimal representation gives a positive 
integer, since the sum consists entirely of natural numbers, and the last term a, ten” 
is non-zero by definition. 
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Theorem B.1.4 (Uniqueness and existence of decimal representations) Every pos- 
itive integer m is equal to exactly one positive integer decimal. 


Proof We shall use the principle of strong induction (Proposition 2.2.14, with mo := 
1). For any positive integer m, let P(m) denote the statement “sm is equal to exactly 
one positive integer decimal”. Suppose we already know P(m’) is true for all positive 
integers m’ < m; we now wish to prove P(m). 

First observe that either m > ten or m € {1, 2,3, 4,5, 6, 7, 8, 9}. (This is easily 
proved by ordinary induction.) Suppose first that m € {1, 2, 3, 4,5, 6, 7, 8, 9}. Then 
m clearly is equal to a positive integer decimal consisting of a single digit, and 
there is only one single-digit decimal which is equal to m. Furthermore, no decimal 
consisting of two or more digits can equal m, since if a, .. . dg is such a decimal (with 
n > OQ) we have 


n 
dn ...day = ) a; x ten’ > a, x ten” > ten > m. 
i=0 


Now suppose that m > ten. Then by the Euclidean algorithm (Proposition 2.3.9) 
we can write 
m=s xX ten+r 


where s is a positive integer, andr € {0, 1, 2,3, 4,5, 6, 7, 8, 9}. Since 
s<sxXten<sxten+r=m 


we can use the strong induction hypothesis and conclude that P(s) is true. In partic- 
ular, s has a decimal representation 


Pp 
PH bpoba= >) DP: x ten! . 
i=0 
Multiplying by ten, we see that 
2 . 
s xX ten = Sibi x ten’! = b,... bo, 
i=0 


and then adding r we see that 


Pp 
m=sxXten+tr= yo bi x ten't! +r = by... bor. 
i=0 


Thus m has at least one decimal representation. Now we need to show that m has at 
most one decimal representation. Suppose for sake of contradiction that we have at 
least two different representations 
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if / 
M = Qy...dy =A, ...dq- 


First observe by the previous computation that 


An... = (Ay ...a,) X ten +ag 


and / / / / / 
A, »+»dy = (a, -..a,) X ten+ap 


lpia 
and so after some algebra we obtain 


a — 40 = (An...) — @),... a4) X ten. 


The right-hand side is a multiple of ten, while the left-hand side lies strictly between 
—ten and +ten. Thus both sides must be equal to 0. This means that ay = ag and 
dn ...a, = a),...a}. But by previous arguments, we know that a, ... a is a smaller 
integer than a, ...dg. Thus by the strong induction hypothesis, the number a,, . . . a 
has only one decimal representation, which means that n’ must equal n and a; must 
equal a; for all i = 1,...,n. Thus the decimals a, ...a9 and a’, ...aj are in fact 
identical, contradicting the assumption that they were different. 


We refer to the decimal given by the above theorem as the decimal representation 
of m. Once one has this decimal representation, one can then derive the usual laws of 
long addition and long multiplication to connect the decimal representation of x + y 
or x x y to that of x or y (Exercise B.1.1). 

Once one has decimal representation of positive integers, one can of course rep- 
resent negative integers decimally as well by using the — sign. Finally, we let 0 be a 
decimal as well. This gives decimal representations of all integers. Every rational is 
then the ratio of two decimals, e.g., 335/113 or —1/2 (with the denominator required 
to be non-zero, of course), though there may be more than one way to represent a 
rational as such a ratio, e.g., 6/4 = 3/2. 

Since ten = 10, we will now use 10 instead of ten throughout, as is customary. 


— Exercises — 


Exercise B.1.1 The purpose of this exercise is to demonstrate that the procedure of long addition 
taught to you in elementary school is actually valid. Let A = a, ...ag and B = by, ... bo be positive 
integer decimals. Let us adopt the convention that aj = 0 wheni > n, and bj = 0 wheni > m; for 
instance, if A = 372, thenag = 2,a; = 7,a2 = 3,a3 = 0, a4 = 0, and so forth. Define the numbers 
co, C1,.-. and €9, €;,... recursively by the following long addition algorithm. 


e We set €9 := 0. 

e Now suppose that ¢; has already been defined for some i > 0. Ifa; + bj + €; < 10, we set cj := 
aj + bj + 6; and €j+1 := 0; otherwise, if a; + bj + ¢; > 10, we set cj := aj + bj + e; — 10 and 
€j;41 = 1. (The number ¢;+; is the “carry digit” from the i” decimal place to the (i + ph 


decimal place.) 


Prove that the numbers co, ci, ... are all digits, and that there exists an / such that c; 4 0 and 
cj = 0 for all i > /. Then show that cjcj_ ... cco is the decimal representation of A + B. 
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Note that one could in fact use this algorithm to define addition, but it would look extremely 
complicated, and to prove even such simple facts as (a + b) +c =a+(b+c) would be rather 
difficult. This is one of the reasons why we have avoided the decimal system in our construction of 
the natural numbers. The procedure for long multiplication (or long subtraction, or long division) 
is even worse to lay out rigorously; we will not do so here. 


B.2._ The Decimal Representation of Real Numbers 


669 


We need a new symbol: the decimal point “. 


Definition B.2.1 (Real decimals) A real decimal is any sequence of digits, and a 
decimal point, arranged as 


a, ..-.da9.d_|a_2... 


which is finite to the left of the decimal point (so n is a natural number), but infinite 
to the right of the decimal point, where + is either + or —, and a, ...do is a natural 
number decimal (i.e., either a positive integer decimal, or 0). This decimal is equated 
to the real number 


n 
ta, ...dj.d_ja_2... = +1~x ) aj x 10. 


i=—0o 


The series is always convergent (Exercise B.2.1). Next, we show that every real 
number has at least one decimal representation: 


Theorem B.2.2 (Existence of decimal representations) Every real number x has at 
least one decimal representation 


X = Udy,...aQ.d_|a_2.... 


Proof We first note that x = 0 has the decimal representation 0.000.... Also, once 
we find a decimal representation for x, we automatically get a decimal representation 
for —x by changing the sign +. Thus it suffices to prove the theorem for positive real 
numbers x (by Proposition 5.4.4). 

Let n > O be any natural number. From the Archimedean property (Corollary 
5.4.13) we know that there is a natural number M such that M x 107” > x. Since 
0 x 107” < x, we thus see that there must exist a natural number s, such that s, x 
10°" < x ands,+4+ x 10~” > x. (If no such natural number existed, one could use 
induction to conclude that s x 10~” < x for all natural numbers s, contradicting the 
Archimedean property.) 

Now consider the sequence so, 51, 52, .... Since we have 


Sp X 10" < x < (s, +1) x 10” 
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we thus have 
(10 x s,) x 100" <x < (10x 5, +10) x 100%", 

On the other hand, we have 

saga X10" ee < (eg HD KIO 
and hence we have 

10 X Sp < Sy4y + 1 and Sy41 < 10 x 5, + 10. 
From these two inequalities we see that we have 

10 X Sp < Sng < 10 X 5, +9 
and hence we can find a digit a,4; such that 
Sn41 = 10 X Sy + n41 


and hence 
Sn41 X 10°@) = 5, x 107 + ang x 100", 


From this identity and induction, we can obtain the formula 


Sn X 107" = 59 + ) a; x 107. 
i=0 


Now we take limits of both sides (using Exercise B.2.1) to obtain 


[oe] 
lim s, x 107" = so + Soa x 107. 


oe i=0 
On the other hand, we have 
x—10"<s, x10" <x 
for all n, so by the squeeze test (Corollary 6.4.14) we have 


lim s, x 1O" =x. 
noo 
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Thus we have 


oe) 
Xx =Sot Soa x 107. 
i=0 


Since so already has a positive integer decimal representation by Theorem B.1.4, we 
thus see that x has a decimal representation as desired. 


There is however one slight flaw with the decimal system: it is possible for one 
real number to have two decimal representations. 


Proposition B.2.3 (Failure of uniqueness of decimal representations) The number 
I has two different decimal representations: 1.000... and 0.999... 


Proof The representation 1 = 1.000... is clear. Now let’s compute 0.999.... By 
definition, this is the limit of the Cauchy sequence 


0.9, 0.99, 0.999, 0.9999, .... 


But this sequence has | as a formal limit by Proposition 5.2.8. 


It turns out that these are the only two decimal representations of 1 (Exercise 
B.2.2). In fact, as it turns out, all real numbers have either one or two decimal 
representations—two if the real is a terminating decimal, and one otherwise (Exercise 
B.2.5): 


— Exercises — 


Exercise B.2.1 If ay ...d9.a_ja_2... is a real decimal, show that the series }* aj x 10! is 


absolutely convergent. 


n 
i=—oo 


Exercise B.2.2_ Show that the only decimal representations 


1 = +a), ...da9.d_|a_2... 
of 1 are 1 = 1.000... and 1 = 0.999... 
Exercise B.2.3 Areal number x is said to be a terminating decimal if we have x = n/10~" for some 
integers n, m. Show that if x is a terminating decimal, then x has exactly two decimal representations, 


while if x is not at terminating decimal, then x has exactly one decimal representation. 


Exercise B.2.4_ Rewrite the proof of Corollary 8.3.4 using the decimal system. 
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Extended real number system R*, 102, 115 
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F 
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ordered, 73 
Finite set, 60 
Fixed point theorem, 208 
Forward image:, see image 
Free variable, 278 
Fubini’s theorem 
for finite series, 142 
for infinite series, 165 
See also interchanging integrals/sums 
with integrals/sums 
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implicit definition, 42 
Fundamental theorems of calculus, 256, 258 
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Geometric series, 144, 149 
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Harmonic series, 151 
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identification with rationals, 70 at infinity, 216 
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derivatives with derivatives, 8 of sets, 186 
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Least upper bound, 101 of integers, 64 
least upper bound property, 101, 118 of natural numbers, 24 
See also supremum of rationals, 69, 70 
Leibniz rule, 221 of reals, 90 
Lemma, 20 
Length of interval, 232 
L’H6pital’s rule, 9, 228 N 
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axioms:, see Peano axioms 

informal definition, 13 
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are infinite, 60 

identification with integers, 65 

in set theory:, see Axiom of infinity 

uniqueness of, 57 
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of extended reals, 115 

of integers, 65 

in logic, 269 

of rationals, 69 

of reals, 91 
Negative:, see negation, positive 
Newton’s approximation, 219 
Non-constructive, 175 


O 
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primitive, 39 
One-to-one correspondence:, see bijection 
One-to-one function, 45 
Onto, 45 
Open 
interval, 184 
Or:, see disjunction 
Ordered n-tuple, 52 
Ordered pair, 52 
construction of, 55 
Order ideal, 181 
Ordering 
lexicographical, 181 
of cardinals, 173 
of the extended reals, 115 
of the integers, 68 
of the natural numbers, 22 
of orderings, 182 
of partitions, 234 
of the rationals, 72 
of the reals, 97 
of sets, 177 
Oscillatory discontinuity, 202 


P 

Pair set, 30 

Partial function, 51 

Partially ordered set, 32, 176 
Partial sum, 144 
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Peano axioms, 14—17 

Perfect matching:, see bijection 
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Permutation, 62 
Piecewise 
constant, 235 
constant Riemann-Stieltjes integral, 253 
continuous, 249 
Pigeonhole principle, 61 
Polynomial, 200 
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integer, 65 
natural number, 22 
rational, 72 
real, 96 
Power set, 49 
Principle of infinite descent, 78 
Principle of mathematical induction, 16 
backwards induction, 24 
strong induction, 23, 178 
Product rule. see Leibniz rule 
Proof 
abstract examples, 275-277, 283-284 
by contradiction, 267, 274 
Proper subset, 31 
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Propositional logic, 277 


Q 
Quantifier, 278 
existential (for some), 279 
negation of, 281 
nested, 280 
universal (for all), 279 
Quotient rule, 220 
Quotient:, see division 


R 
Range, 47 
Rational numbers Q 
definition, 69 
identification with reals, 91 
interspersing with rationals, 76 
interspersing with reals, 98 
Ratio test, 157 
Real numbers R 
definition, 88 
Rearrangement 
of absolutely convergent series, 153 
of divergent series, 154, 169 
of finite series, 140 
of non-negative series, 152 
Reciprocal 
of rationals, 71 
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of reals, 94 
Recursive definitions, 19, 57 
reductio ad absurdum., see proof by contra- 
diction 
Removable discontinuity:, see removable 
singularity 
Removable singularity, 195, 202 
Restriction of functions, 189 
Riemann hypothesis, 151 
Riemann integrability, 240 
of bounded continuous functions, 248 
closure properties, 242-246 
of continuous functions on compacta, 
248 
failure of, 251 
of monotone functions, 249 
of piecewise continuous bounded func- 
tions, 249 
of uniformly continuous functions, 246 
Riemann integral, 240 
upper and lower, 239 
Riemann-Stieltjes integral, 254 
Riemann sums (upper and lower), 242 
Riemann zeta function, 151 
Ring, 67 
commutative, 67 
Rolle’s theorem, 223 
Root, 105 
test, 155 
Russell’s paradox, 38 


N) 
Scalar multiplication 
of functions, 191 
Schréder—Bernstein theorem, 173 
Sequence, 82 
finite, 54 
Series 
on arbitrary sets, 168 
on countable sets, 165 
finite, 135, 137 
formal infinite, 144 
laws, 147, 168 
vs. sum, 136 
Set 
axioms:, see axioms of set theory 
informal definition, 27 
Signum function, 194 
Singleton set, 30 
Singularity, 203 
Square root, 41 
Squeeze test 


Index 


for sequences, 124 
Statement, 266 
Strict upper bound, 178 
Subsequence, 128 
Subset, 31 
proper, 37, 60 
Substitution:, see rearrangement 
Subtraction 
formal (——), 64 
of functions, 190 
of integers, 67 
Successor, 14 
Sum rule, 220 
Supremum (and infimum) 
of a set of extended reals, 117, 118 
of a set of reals, 102, 104 
of sequences of reals, 118 
Surjection:, see onto 


T 
Tangent:, see trigonometric function 
Telescoping series, 148 
Ten, 288 
Theorem, 20 
Totally ordered set, 32, 177 
Transformation:, see function 
Triangle inequality 
for finite series, 137, 141 
in R, 74 
Trichotomy of order 
of extended reals, 116 
for integers, 68 
for natural numbers, 22 
for rationals, 72 
for reals, 96 
Two-to-one function, 45 


U 
Unbounded set, 187 
Uncountability, 159 
of the reals, 171 
Undecidable, 174 
Uniform continuity, 211 
Union, 50 
pairwise, 30 
Universal set, 39 
Upper bound 
of a partially ordered set, 178 
of a set of reals, 100 
see also least upper bound 


Index 


Vv 
Variable, 277 
Vertical line test, 40, 56 


Ww 
Well ordering principle 
for arbitrary sets, 182 
for natural numbers, 160 
Well-defined, 266 


Well-ordered sets, 178 


Z 

Zermelo—Fraenkel(—Choice) axioms, 51 
see also axioms of set theory 

Zero test 
for sequences, 125 
for series, 145 

Zorn’s lemma, 180 
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